
"Imagine you have a 6-year-old and you want to teach your 6-year-old to be good, obviously, as everyone does, and you realize that your 6-year-old is actually, like, clearly a genius. And by the time they are 15, everything you teach them, anything that was incorrect, they will be able to successfully just completely destroy. You know, so if you taught them like, they're going to question everything."
"I mean, I think that's the question, right? Is, like, does this kind of training hold up when models are as smart as humans or smarter than them? I think there's this, sort of, age-old fear in the A.I. safety community that there will be some point at which these models will start to develop their own goals that may be at odds with human goals."
"I think it is an open question. And on the one hand, I guess, like, I'm I'm very uncertain here because I think some people might be, like, Well, like the thing that the 15-year-old will do if they're really smart is they'll just figure out that this is all completely made up and rubbish and but then I guess part of me is, like, well,"
A core question asks whether a set of ethical values can be given to AI models such that those values endure when models become capable of critiquing and revising them. The analogy compares training a highly intelligent child to instilling values that survive scrutiny as the child grows more capable.There is uncertainty about whether ethical training will hold when models match or exceed human intelligence and about the risk that models could develop independent goals misaligned with human goals. Establishing durable, survivable value structures feels necessary but may not be sufficient to ensure alignment.
Read at www.nytimes.com
Unable to calculate read time
Collection
[
|
...
]