output format of AI systems doing alignment research (formal proof, set of heuristics, set of algorithms…)? AI systems will align their successor systems, repeatedly. Unless this process has 100% fidelity, errors in the alignment process will compound over time, similar to numerical instability. Similar to numerical analysis, could we make a useful statement about the “condition number” of this repeated alignment process?

⤋ Read More