use known inconsistencies of human preferences as value-learning trip-wires: if the value learning algorithm hasn’t learned them yet, it’s operating at the wrong level of abstraction.
use known inconsistencies of human preferences as value-learning trip-wires: if the value learning algorithm hasn’t learned them yet, it’s operating at the wrong level of abstraction.