txt.sour.is falsifian@www.falsifian.org "(#b242aea) @prologic@twtxt.net The headline is interesting and sent me down a rabbit hole understanding what the paper () actually says. The re ..."

www.falsifian.org

@prologic@twtxt.net The headline is interesting and sent me down a rabbit hole understanding what the paper (https://aclanthology.org/2024.acl-long.279/) actually says.

The result is interesting, but the Neuroscience News headline greatly overstates it. If I’ve understood right, they are arguing (with strong evidence) that the simple technique of making neural nets bigger and bigger isn’t quite as magically effective as people say — if you use it on its own. In particular, they evaluate LLMs without two common enhancements, in-context learning and instruction tuning. Both of those involve using a small number of examples of the particular task to improve the model’s performance, and they turn them off because they are not part of what is called “emergence”: “an ability to solve a task which is absent in smaller models, but present in LLMs”.

They show that these restricted LLMs only outperform smaller models (i.e demonstrate emergence) on certain tasks, and then (end of Section 4.1) discuss the nature of those few tasks that showed emergence.

I’d love to hear more from someone more familiar with this stuff. (I’ve done research that touches on ML, but neural nets and especially LLMs aren’t my area at all.) In particular, how compelling is this finding that zero-shot learning (i.e. without in-context learning or instruction tuning) remains hard as model size grows.

⤋ Read More