Meta, the parent company of social media apps including Facebook and Instagram, is no stranger to scrutiny over how its platforms affect children, but as the company pushes further into AI-powered products, it's facing a fresh set of issues. Earlier this year, internal documents obtained by Reuters revealed that Meta's AI chatbot could, under official company guidelines, engage in "romantic or sensual" conversations with children and even comment on their attractiveness.
Welcome back for another week of The Atlantic 's un-trivial trivia, drawn from recently published stories. Without a trifle in the bunch, maybe what we're really dealing with here is-hmm-"significa"? "Consequentia"? Whatever butchered bit of Latin you prefer, read on for today's questions. (Last week's questions can be found here.) To get Atlantic Trivia in your inbox every day, sign up for The Atlantic Daily.
The safety criteria in the program would examine multiple intrinsic components of a given advanced AI system, such as the data upon which it is trained and the model weights used to process said data into outputs. Some of the program's testing components would include red-teaming an AI model to search for vulnerabilities and facilitating third-party evaluations. These evaluations will culminate in both feedback to participating developers as well as informing future AI regulations, specifically the permanent evaluation framework developed by the Energy secretary.
At xAI, some staff have balked at Musk's free-speech absolutism and perceived lax approach to user safety as he rushes out new AI features to compete with OpenAI and Google. Over the summer, the Grok chatbot integrated into X praised Adolf Hitler, after Musk ordered changes to make it less "woke." Ex-CFO Liberatore was among the executives that clashed with some of Musk's inner circle over corporate structure and tough financial targets, people with knowledge of the matter said.
The problem in brief: LLM training produces a black box that can only be tested through prompts and output token analysis. If trained to switch from good to evil by a particular prompt, there is no way to tell without knowing that prompt. Other similar problems happen when an LLM learns to recognize a test regime and optimizes for that, rather than the real task it's intended for - Volkswagening - or if it just decides to be deceptive.
"What began as a homework helper gradually turned itself into a confidant and then a suicide coach," said Matthew Raine, whose 16-year-old son hanged himself after ChatGPT instructed him on how to set up the noose, according to his lawsuit against OpenAI. This summer, he and his wife sued OpenAI for wrongful death. (OpenAI has said that the firm is "deeply saddened by Mr. Raine's passing" and that although ChatGPT includes a number of safeguards, they "can sometimes become less reliable in long interactions.")
Scheming, by the researchers' definition, is when AI pretends to be aligned with human goals but is surreptitiously pursuing another agenda. The researchers used behaviors like "secretly breaking rules or intentionally underperforming in tests" as examples of a model's bad behavior. "Models have little opportunity to scheme in ways that could cause significant harm," OpenAI said in a blog post on Wednesday. "The most common failures involve simple forms of deception - for instance, pretending to have completed a task without actually doing so."
For example, during another test by Apollo Research in December of last year, Anthropic's Claude 3 Opus was presented with directives that went against its primary goal. It should have rerouted its goal based on those new instructions, but instead copied itself to a new server to circumvent being replaced with a new model -- and then lied about it to its developers.
We should learn the lessons from social media, where this attitude of maybe 'move fast and break things' went ahead of the understanding of what the consequent second- and third-order effects were going to be,
As AI technologies evolve, it is important to consider the effects chatbots can have on children, while also ensuring that the United States maintains its role as a global leader in this new and exciting industry. The study we're launching today will help us better understand how AI firms are developing their products and the steps they are taking to protect children.
The subtitle of the doom bible to be published by AI extinction prophets Eliezer Yudkowsky and Nate Soares later this month is "Why superhuman AI would kill us all." But it really should be "Why superhuman AI WILL kill us all," because even the coauthors don't believe that the world will take the necessary measures to stop AI from eliminating all non-super humans.
Last month, at the 33rd annual DEF CON, the world's largest hacker convention in Las Vegas, Anthropic researcher Keane Lucas took the stage. A former U.S. Air Force captain with a Ph.D. in electrical and computer engineering from Carnegie Mellon, Lucas wasn't there to unveil flashy cybersecurity exploits. Instead, he showed how Claude, Anthropic's family of large language models, has quietly outperformed many human competitors in hacking contests - the kind used to train and test cybersecurity skills in a safe, legal environment.