I need to find evidence for/against the claim that there was a training run of GPT-2 that maximized negative log-loss – I’ve heard it a couple of times on the internet and already spread the meme myself, but I haven’t seen it in a paper or blogpost

⤋ Read More