niplav

niplav.github.io

No description provided.

Recent twts from niplav

And last but not least: Contests such as the underhanded C contest show that it is quite easy in sufficiently expressive formal systems) to create malicious outputs with high plausible deniability. Since training ML systems is even more expressive than that, the existence of such successful contests seems worrying

⤋ Read More

output format of AI systems doing alignment research (formal proof, set of heuristics, set of algorithms…)? AI systems will align their successor systems, repeatedly. Unless this process has 100% fidelity, errors in the alignment process will compound over time, similar to numerical instability. Similar to numerical analysis, could we make a useful statement about the “condition number” of this repeated alignment process?

⤋ Read More

are there any other industries in which we apply a similar safety standard as training of large neural networks? the closest that comes to my mind is animal husbandry, but I think we understand the genetics of that better than we understand neural networks, and don’t apply ever-more increasing optimization power to it. (although, to be fair, when humans began husbandry they understood it far less)

⤋ Read More

von Neumann: I came up with this new system that generalizes probability theory to consider convex sets instead of point estimates. I think that I could use this to prove regret bounds…

⤋ Read More

you might be thinking: “aha! so I should vote in elections, since even though under do()-calculus, the decision has a miniscule impact, there are many agents that are logically correlated with me, which means my influence is much higher!” A tiny problem is that the number of agents that are logically correlated because they base their decisions on logical correlation is, ah, not that big…

⤋ Read More