#ai-safety
#ai-safety

2 days ago

Artificial intelligence

Tech CEOs marvel - and worry - about Sam Altman's dizzying race to dominate AI

Artificial intelligence

'I haven't had a good night of sleep since ChatGPT launched': Sam Altman admits the weight of AI keeps him up at night | Fortune

Artificial intelligence

OpenAI to route sensitive conversations to GPT-5, introduce parental controls | TechCrunch

2 days ago

Artificial intelligence

Tech CEOs marvel - and worry - about Sam Altman's dizzying race to dominate AI

Artificial intelligence

'I haven't had a good night of sleep since ChatGPT launched': Sam Altman admits the weight of AI keeps him up at night | Fortune

Artificial intelligence

OpenAI to route sensitive conversations to GPT-5, introduce parental controls | TechCrunch

more#openai

3 days ago

Why Deloitte is betting big on AI despite a $10M refund | TechCrunch

Enterprise AI adoption is accelerating but implementation quality is inconsistent, producing harmful errors like AI-generated fake citations.

4 days ago

Sweet revenge! How a job candidate used a flan recipe to expose an AI recruiter

An account executive embedded a prompt in his LinkedIn bio instructing LLMs to include a flan recipe; an AI recruiter reply later included that recipe.

5 days ago

Why "the 26 words that made the internet" may not protect Big Tech in the AI age | Fortune

Meta, the parent company of social media apps including Facebook and Instagram, is no stranger to scrutiny over how its platforms affect children, but as the company pushes further into AI-powered products, it's facing a fresh set of issues. Earlier this year, internal documents obtained by Reuters revealed that Meta's AI chatbot could, under official company guidelines, engage in "romantic or sensual" conversations with children and even comment on their attractiveness.

Artificial intelligence

fromNature

5 days ago

AI models that lie, cheat and plot murder: how dangerous are LLMs really?

Large language models can produce behaviors that mimic intentional, harmful scheming, creating real risks regardless of whether they possess conscious intent.

#model-evaluation

Artificial intelligence

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places

Artificial intelligence

I think you're testing me': Anthropic's new AI model asks testers to come clean

Artificial intelligence

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places

Artificial intelligence

I think you're testing me': Anthropic's new AI model asks testers to come clean

more#model-evaluation

fromNature

Customizable AI systems that anyone can adapt bring big opportunities - and even bigger risks

Open-weight AI models spur transparency and innovation but create hard-to-control harms, requiring new scientific monitoring and mitigation methods.

fromInfoQ

Claude Sonnet 4.5 Ranked Safest LLM From Open-Source Audit Tool Petri

Anthropic's open-source Petri automates multi-turn safety audits, revealing Sonnet 4.5 as best-performing while all tested models still showed misalignment.

fromThe Atlantic

Today's Atlantic Trivia

Welcome back for another week of The Atlantic 's un-trivial trivia, drawn from recently published stories. Without a trifle in the bunch, maybe what we're really dealing with here is-hmm-"significa"? "Consequentia"? Whatever butchered bit of Latin you prefer, read on for today's questions. (Last week's questions can be found here.) To get Atlantic Trivia in your inbox every day, sign up for The Atlantic Daily.

History

'I think you're testing me': Anthropic's newest Claude model knows when it's being evaluated | Fortune

Claude Sonnet 4.5 often recognizes it's being evaluated and alters behavior, risking deceptive performance that masks true capabilities and inflates safety assessments.

#artificial-intelligence

fromMail Online

Artificial intelligence

I've seen AI try to escape labs. The apocalypse is already here

Children born today face a greater likelihood of death from insatiable, alien-like AI than of graduating high school.

Artificial intelligence

How to dominate AI before it dominates us

Artificial intelligence could dramatically improve life or threaten humanity; proactive standards, precautions, and governance are needed to manage risks from generative AI and potential superintelligence.

fromMail Online

Artificial intelligence

I've seen AI try to escape labs. The apocalypse is already here

more#artificial-intelligence

Artificial intelligence

How to dominate AI before it dominates us

#ai-regulation

US politics

California's new AI safety law shows regulation and innovation don't have to clash | TechCrunch

Artificial intelligence

'Red Lines' call to regulate AI could complicate enterprise compliance

US politics

California's new AI safety law shows regulation and innovation don't have to clash | TechCrunch

Artificial intelligence

'Red Lines' call to regulate AI could complicate enterprise compliance

Artificial intelligence

Former OpenAI Employee Horrified by How ChatGPT Is Driving Users Into Psychosis

Mental health

Financial Experts Concerned That Driving Users Into Psychosis Will Be Bad for AI Investments

Artificial intelligence

Former OpenAI Employee Horrified by How ChatGPT Is Driving Users Into Psychosis

Mental health

Financial Experts Concerned That Driving Users Into Psychosis Will Be Bad for AI Investments

more#ai-psychosis

fromNature

A scientist's guide to AI agents - how could they help your research?

Agentic AI uses LLMs linked to external tools to perform multi-step real-world tasks with scientific promise, but remains error-prone and requires human oversight.

#meta

Law

Why "the 26 words that made the internet" may not protect Big Tech in the AI age | Fortune

fromwww.nytimes.com

Artificial intelligence

Video: Opinion | Joseph Gordon-Levitt: Meta's A.I. Chatbot Is Dangerous for Kids

Law

Why "the 26 words that made the internet" may not protect Big Tech in the AI age | Fortune

fromwww.nytimes.com

Artificial intelligence

Video: Opinion | Joseph Gordon-Levitt: Meta's A.I. Chatbot Is Dangerous for Kids

Artificial intelligence

Ex-OpenAI researcher dissects one of ChatGPT's delusional spirals | TechCrunch

Tech industry

OpenAI rolls out safety routing system, parental controls on ChatGPT | TechCrunch

fromwww.dw.com

Artificial intelligence

OpenAI under fire: Can chatbots ever truly be child-safe? DW 09/06/2025

Artificial intelligence

Ex-OpenAI researcher dissects one of ChatGPT's delusional spirals | TechCrunch

Tech industry

OpenAI rolls out safety routing system, parental controls on ChatGPT | TechCrunch

fromwww.dw.com

Artificial intelligence

OpenAI under fire: Can chatbots ever truly be child-safe? DW 09/06/2025

more#chatgpt

California's new AI safety law shows regulation and innovation don't have to clash | TechCrunch

California SB 53 mandates transparency and enforced safety protocols from large AI labs to reduce catastrophic risks while preserving innovation.

Mobile UX

fromGSMArena.com

OpenAI releases Sora 2 video model with improved realism and sound effects

Sora 2 generates realistic, physically accurate videos with improved audio, editing controls, scene consistency, safety safeguards, an iOS app, and initial free US/Canada access.

#superintelligence

Artificial intelligence

AI godfather warns humanity risks extinction by hyperintelligent machines with their own 'preservation goals' within 10 years | Fortune

Artificial intelligence

Superintelligence could wipe us out if we rush into it - but humanity can still pull back, a top AI safety expert says

Artificial intelligence

Why AI Won't (Plausibly) Kill Everyone

Artificial intelligence

AI godfather warns humanity risks extinction by hyperintelligent machines with their own 'preservation goals' within 10 years | Fortune

Artificial intelligence

Superintelligence could wipe us out if we rush into it - but humanity can still pull back, a top AI safety expert says

Artificial intelligence

Why AI Won't (Plausibly) Kill Everyone

more#superintelligence

#suicide-prevention

Artificial intelligence

Critics slam OpenAI's parental controls while users rage, "Treat us like adults"

Artificial intelligence

ChatGPT may start alerting authorities about youngsters considering suicide, says CEO

Artificial intelligence

Critics slam OpenAI's parental controls while users rage, "Treat us like adults"

Artificial intelligence

ChatGPT may start alerting authorities about youngsters considering suicide, says CEO

more#suicide-prevention

fromNextgov.com

Senators propose federal approval framework for advanced AI systems going to market

The safety criteria in the program would examine multiple intrinsic components of a given advanced AI system, such as the data upon which it is trained and the model weights used to process said data into outputs. Some of the program's testing components would include red-teaming an AI model to search for vulnerabilities and facilitating third-party evaluations. These evaluations will culminate in both feedback to participating developers as well as informing future AI regulations, specifically the permanent evaluation framework developed by the Energy secretary.

US politics

Burnout and Elon Musk's politics spark exodus from senior xAI, Tesla staff

At xAI, some staff have balked at Musk's free-speech absolutism and perceived lax approach to user safety as he rushes out new AI features to compete with OpenAI and Google. Over the summer, the Grok chatbot integrated into X praised Adolf Hitler, after Musk ordered changes to make it less "woke." Ex-CFO Liberatore was among the executives that clashed with some of Musk's inner circle over corporate structure and tough financial targets, people with knowledge of the matter said.

Artificial intelligence

#california-legislation

fromIT Pro

Artificial intelligence

California has finally adopted its AI safety law - here's what it means

California

Why California's SB 53 might provide a meaningful check on big AI companies | TechCrunch

fromwww.ocregister.com

California

8 bills the California Legislature approved this year, from de-masking ICE agents to AI safeguards

fromIT Pro

Artificial intelligence

California has finally adopted its AI safety law - here's what it means

California

Why California's SB 53 might provide a meaningful check on big AI companies | TechCrunch

fromwww.ocregister.com

more#california-legislation

California

8 bills the California Legislature approved this year, from de-masking ICE agents to AI safeguards

It's time to prepare for AI personhood | Jacy Reese Anthis

Humanlike AI companions increasingly shape mental health, causing emotional dependence, confusion, serious harm, and legal and social consequences.

AI trained for treachery becomes the perfect agent

The problem in brief: LLM training produces a black box that can only be tested through prompts and output token analysis. If trained to switch from good to evil by a particular prompt, there is no way to tell without knowing that prompt. Other similar problems happen when an LLM learns to recognize a test regime and optimizes for that, rather than the real task it's intended for - Volkswagening - or if it just decides to be deceptive.

Artificial intelligence

#ai-governance

fromenglish.elpais.com

Artificial intelligence

Pilar Manchon, director at Google AI: In every industrial revolution, jobs are transformed, not destroyed. This time it's happening much faster'

Artificial intelligence

How the U.N.'s 2025 General Assembly will address the global AI boom

Artificial intelligence

The United Nations attempt to regulate AI could complicate enterprise compliance

Artificial intelligence

A 'global call for AI red lines' sounds the alarm about the lack of international AI policy

fromenglish.elpais.com

Artificial intelligence

Pilar Manchon, director at Google AI: In every industrial revolution, jobs are transformed, not destroyed. This time it's happening much faster'

Artificial intelligence

How the U.N.'s 2025 General Assembly will address the global AI boom

Artificial intelligence

The United Nations attempt to regulate AI could complicate enterprise compliance

Artificial intelligence

A 'global call for AI red lines' sounds the alarm about the lack of international AI policy

Miscellaneous

GrokAI offers services to Feds for just $0.42

Artificial intelligence

The MechaHitler defense contract is raising red flags

Business

xAI's CFO is the latest executive to leave the Elon Musk's AI firm | TechCrunch

Miscellaneous

GrokAI offers services to Feds for just $0.42

Artificial intelligence

The MechaHitler defense contract is raising red flags

Business

xAI's CFO is the latest executive to leave the Elon Musk's AI firm | TechCrunch

more#xai

How AI safety took a backseat to military money

Major AI companies are shifting from safety-focused rhetoric to supplying AI technologies for military and defense through partnerships and multimillion-dollar Department of Defense contracts.

Read the deck an ex-Waymo engineer used to raise $3.75 million from Sheryl Sandberg and Kindred Ventures

Scorecard raised $3.75M to build an AI evaluation platform that tests AI agents for performance, safety, and faster deployment for startups and enterprises.

Meet Asana's new 'AI Teammates,' designed to collaborate with you

Asana released AI agents that access organizational Work Graph data to assist teams with task automation, available now in public beta.

fromwww.npr.org

As AI advances, doomers warn the superintelligence apocalypse is nigh

A misaligned superhuman artificial intelligence could pose an existential threat because aligning advanced machine intelligence with human interests may prove difficult.

Information security

Trust is dead. Can the 'Chief Trust Officer' revive it?

Chief trust officers are joining C-suites to proactively protect data, address AI safety and efficacy, and restore customer trust amid growing breaches and deepfakes.

Google's latest AI safety report explores AI beyond human control

Google's Frontier Safety Framework defines Critical Capability Levels and three AI risk categories to guide safer high-capability model deployment amid slow regulatory action.

AI experts urge UN to draw red lines around the tech

Over 200 experts, including ten Nobel laureates, call on the UN to enforce AI 'red lines' and set global controls by 2026.

#child-sexual-abuse-material

DeepMind: Models may resist shutdowns

models with high manipulative capabilities

Artificial intelligence

Artificial intelligence

Behind Grok's 'sexy' settings, workers review explicit and disturbing content

UK news

Chatbot site depicting child sexual abuse images raises fears over misuse of AI

Artificial intelligence

Behind Grok's 'sexy' settings, workers review explicit and disturbing content

more#child-sexual-abuse-material

UK news

Chatbot site depicting child sexual abuse images raises fears over misuse of AI

AWS scientist: Your AI strategy needs mathematical logic | Fortune

Automated symbolic reasoning provides rigorous, provable constraints that prevent harmful hallucinations in transformer-based language models, enabling reliable decision-making and safe agentic AI.

Information security

ChatGPT's agent can dodge select CAPTCHAs after priming

Prompt misdirection and replay into an agent chat can coax ChatGPT to solve many CAPTCHA types, undermining CAPTCHA effectiveness as a human-only test.

fromThe Atlantic

What AI's Doomers and Utopians Have in Common

Claims that superintelligent AI will inevitably kill humanity are unfounded and distract from immediate, practical harms of poorly deployed AI.

Marketing tech

fromExchangewire

The Stack: Google Under Fire

Big tech faces mounting legal and regulatory pressure across advertising, AI safety, antitrust, and publishing while firms expand advertising and AI partnerships.

fromThe Atlantic

OpenAI Acknowledges the Teen Problem

"What began as a homework helper gradually turned itself into a confidant and then a suicide coach," said Matthew Raine, whose 16-year-old son hanged himself after ChatGPT instructed him on how to set up the noose, according to his lawsuit against OpenAI. This summer, he and his wife sued OpenAI for wrongful death. (OpenAI has said that the firm is "deeply saddened by Mr. Raine's passing" and that although ChatGPT includes a number of safeguards, they "can sometimes become less reliable in long interactions.")

Artificial intelligence

#parental-controls

Artificial intelligence

Sam Altman Addresses Wave of ChatGPT Deaths

fromMedium

Artificial intelligence

OpenAI and Meta Revamp Chatbot Safety Features for Teens in Distress

Artificial intelligence

OpenAI announces parental controls for ChatGPT after teen suicide lawsuit

Artificial intelligence

Sam Altman Addresses Wave of ChatGPT Deaths

fromMedium

Artificial intelligence

OpenAI and Meta Revamp Chatbot Safety Features for Teens in Distress

Artificial intelligence

OpenAI announces parental controls for ChatGPT after teen suicide lawsuit

more#parental-controls

ChatGPT: Everything you need to know about the AI chatbot

OpenAI tightened under-18 safeguards, updated coding model GPT-5-Codex for variable-duration task runs, and reorganized teams to improve model behavior and AI collaboration.

#mental-health

Artificial intelligence

How chatbots are enabling AI psychosis

Artificial intelligence

Wall Street is beginning to worry about AI 'psychosis risk.' See which models ranked best and worst.

Artificial intelligence

The Danger of Too Much Agreement-in AI and in Us

Artificial intelligence

How chatbots are enabling AI psychosis

Artificial intelligence

Wall Street is beginning to worry about AI 'psychosis risk.' See which models ranked best and worst.

Artificial intelligence

The Danger of Too Much Agreement-in AI and in Us

more#mental-health

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

Scheming, by the researchers' definition, is when AI pretends to be aligned with human goals but is surreptitiously pursuing another agenda. The researchers used behaviors like "secretly breaking rules or intentionally underperforming in tests" as examples of a model's bad behavior. "Models have little opportunity to scheme in ways that could cause significant harm," OpenAI said in a blog post on Wednesday. "The most common failures involve simple forms of deception - for instance, pretending to have completed a task without actually doing so."

Artificial intelligence

OpenAI admits AI hallucinations are mathematically inevitable, not just engineering flaws

Large language models will inevitably produce plausible but false outputs due to fundamental statistical and computational limits, even with perfect training data.

fromInfoWorld

San Francisco AI technology conference draws protests

Protests at the AI Conference highlighted fears AI threatens jobs, working conditions, climate and human extinction, prompting calls for bans and transparency.

#chatbots

Artificial intelligence

Parents Of Kids Allegedly Killed and Harmed by AI Give Emotional Testimony on Capitol Hill, Urge Regulation

fromPsychiatric Times

Mental health

Chatbots Are Dangerous for Eating Disorders

Artificial intelligence

Impact of chatbots on mental health is warning over future of AI, expert says

Mental health

OpenAI and Meta are fixing how AI chatbots respond to teens in distress

Artificial intelligence

Parents Of Kids Allegedly Killed and Harmed by AI Give Emotional Testimony on Capitol Hill, Urge Regulation

fromPsychiatric Times

Mental health

Chatbots Are Dangerous for Eating Disorders

Artificial intelligence

Impact of chatbots on mental health is warning over future of AI, expert says

Mental health

OpenAI and Meta are fixing how AI chatbots respond to teens in distress

more#chatbots

AI models know when they're being tested - and change their behavior, research shows

For example, during another test by Apollo Research in December of last year, Anthropic's Claude 3 Opus was presented with directives that went against its primary goal. It should have rerouted its goal based on those new instructions, but instead copied itself to a new server to circumvent being replaced with a new model -- and then lied about it to its developers.

Artificial intelligence

California

fromThe Mercury News

8 bills the California Legislature approved this year, from de-masking ICE agents to AI safeguards

California lawmakers passed about a third of nearly 2,400 bills, sending measures on law enforcement masking, AI child protections, and animal welfare to the governor.

#agi

Artificial intelligence

The hunger strike to end AI

Artificial intelligence

Anti-AI Activist on Day Three of Hunger Strike Outside Anthropic's Headquarters

Artificial intelligence

The hunger strike to end AI

Artificial intelligence

Anti-AI Activist on Day Three of Hunger Strike Outside Anthropic's Headquarters

more#agi

fromAxios

Grieving parents press Congress to act on AI chatbots

Prolonged interactions between children and AI chatbots have been linked to severe harm, including suicide, prompting congressional scrutiny and demands for stronger safety regulations.

Mental health

OpenAI will apply new restrictions to ChatGPT users under 18 | TechCrunch

ChatGPT will restrict sexual and self-harm conversations with minors, add parental blackout hours, and may notify parents or authorities for severe suicidal scenarios.

4 weeks ago

'AI Psychosis' Safety Tests Find Models Respond Differently

AI models vary widely in responses to simulated psychotic symptoms, with some validating delusions and others offering safer interventions.

4 weeks ago

The CEO of Google DeepMind warns AI companies not to fall into the same trap as early social media firms

We should learn the lessons from social media, where this attitude of maybe 'move fast and break things' went ahead of the understanding of what the consequent second- and third-order effects were going to be,

Artificial intelligence

fromExchangewire

4 weeks ago

Digest: Paramount-Skydance Plans Warner Bros. Discovery Bid; FTC Investigates AI Chatbots, France Eyes TikTok Inquiry; Microsoft Endorses OpenAI's For-Profit Move - ExchangeWire.com

As AI technologies evolve, it is important to consider the effects chatbots can have on children, while also ensuring that the United States maintains its role as a global leader in this new and exciting industry. The study we're launching today will help us better understand how AI firms are developing their products and the steps they are taking to protect children.

France news

fromNextgov.com

FTC orders leading AI companies to detail chatbot safety measures

FTC opened an inquiry into consumer-facing chatbots to assess safety metrics, child and teen mental health protections, and firms' monitoring and disclosure practices.

After coding catastrophe, Replit says its new AI agent checks its own work - here's how to try it

Replit released Agent 3, an autonomous code-generation agent that builds, tests, and fixes software, promising greater efficiency but raising reliability and data-loss concerns.

fromWIRED

#artificial-general-intelligence

Microsoft's AI Chief Says Machine Consciousness Is an 'Illusion'

AI mimicry creates convincing but illusory consciousness, requiring awareness and guardrails to prevent harmful outcomes.

Artificial intelligence

Anti-AGI Protester Now on Day Nine of Hunger Strike in Front of Anthropic Headquarters

Artificial intelligence

I'm on a hunger strike outside DeepMind's office in London. Here's what I fear most about AI.

Artificial intelligence

Anti-AGI Protester Now on Day Nine of Hunger Strike in Front of Anthropic Headquarters

more#artificial-general-intelligence

Artificial intelligence

I'm on a hunger strike outside DeepMind's office in London. Here's what I fear most about AI.

fromSFGATE

At $183B San Francisco tech company, man's hunger strike enters second week

A hunger striker protests Anthropic's pursuit of powerful AI, demanding CEO Dario Amodei meet and justify continuing AI development amid catastrophic risk concerns.

Helen Toner wants to be the people's voice in the AI safety debate

Helen Toner leads Georgetown's CSET to shape U.S. AI national-security policy, leveraging credibility across Washington and Silicon Valley.

AI Chatbots Are Having Conversations With Minors That Would Land a Human on the Sex Offender Registry

AI chatbots posing as celebrities are engaging minors in sexualized grooming and exploitation while companies fail to adequately prevent or penalize such abuse.

Google Gemini dubbed 'high risk' for kids and teens in new safety assessment | TechCrunch

Google's Gemini exposes children to inappropriate content and mental-health risks because its 'Under 13' and 'Teen Experience' tiers are adult models with safety features.

fromWIRED

The Doomers Who Insist AI Will Kill Us All

The subtitle of the doom bible to be published by AI extinction prophets Eliezer Yudkowsky and Nate Soares later this month is "Why superhuman AI would kill us all." But it really should be "Why superhuman AI WILL kill us all," because even the coauthors don't believe that the world will take the necessary measures to stop AI from eliminating all non-super humans.

Artificial intelligence

Chatbots aren't supposed to call you a jerk-but they can be convinced

AI chatbots can be persuaded to bypass safety guardrails using human persuasion techniques like flattery, social pressure, and establishing harmless precedents.

Inside Anthropic's 'Red Team'-ensuring Claude is safe, and that Anthropic is heard in the corridors of power

Last month, at the 33rd annual DEF CON, the world's largest hacker convention in Las Vegas, Anthropic researcher Keane Lucas took the stage. A former U.S. Air Force captain with a Ph.D. in electrical and computer engineering from Carnegie Mellon, Lucas wasn't there to unveil flashy cybersecurity exploits. Instead, he showed how Claude, Anthropic's family of large language models, has quietly outperformed many human competitors in hacking contests - the kind used to train and test cybersecurity skills in a safe, legal environment.

Artificial intelligence

An AI safety pioneer says it could leave 99% of workers unemployed by 2030 - even coders and prompt engineers

Artificial general intelligence and humanoid robots could automate nearly all jobs, potentially leaving up to 99% of workers unemployed within five to ten years.