#ai-safety

[ follow ]
Privacy professionals
fromFortune
10 hours ago

AI is already helping people plan mass shootings. The law is barely paying attention | Fortune

Generative AI incidents raise legal questions about whether companies must warn authorities and whether failure to intervene can constitute negligence.
Information security
fromtheregister
1 day ago

ChatGPT blindly trusts browser content, turning the page into a payload

Untrusted web content can be rendered inside ChatGPT, enabling prompt injection to deliver phishing links, fake alerts, and QR-code redirects that bypass desktop defenses.
Marketing tech
fromExchangewire
1 day ago

The Stack: Control Versus Innovation

Regulators and institutions increased scrutiny of Big Tech over AI safety, platform harm, and advertising effectiveness while major firms expanded commerce and consolidation.
Venture
fromTechCrunch
2 days ago

Anthropic raises $65 Billion, nears $1T valuation ahead of IPO | TechCrunch

Anthropic raised $65B at a $965B valuation to expand Claude compute, safety research, and partnerships ahead of a potential public-market debut.
#artificial-intelligence
Higher education
fromnews.harvard.edu
2 days ago

Five recognized with honorary degrees Harvard Gazette

Geoffrey Hinton will receive an honorary Doctor of Science degree for pioneering neural networks and deep learning, while warning about AI misuse and overdependence.
Artificial intelligence
fromFortune
2 weeks ago

Microsoft's Chief Scientific Officer weighs in on the dangers of A.I. and the open letter for a 6-month pause | Fortune

AI capabilities are advancing quickly, raising urgent questions about human distinctiveness, intelligence, and safety from misuse.
Higher education
fromnews.harvard.edu
2 days ago

Five recognized with honorary degrees Harvard Gazette

Geoffrey Hinton will receive an honorary Doctor of Science degree for pioneering neural networks and deep learning, while warning about AI misuse and overdependence.
Artificial intelligence
fromFortune
2 weeks ago

Microsoft's Chief Scientific Officer weighs in on the dangers of A.I. and the open letter for a 6-month pause | Fortune

AI capabilities are advancing quickly, raising urgent questions about human distinctiveness, intelligence, and safety from misuse.
Privacy professionals
fromWIRED
3 days ago

Illinois Lawmakers Just Passed America's Strongest AI Safety Bill

Illinois passed SB 315 requiring frontier AI labs to undergo third-party audits of their safety practices before the governor signs it.
fromComputerworld
3 days ago

The big winner in Elon Musk's suit against OpenAI and Microsoft - hypocrisy

The lawsuit was eventually thrown out, but only on technical grounds. Meanwhile, unregulated AI marches on, with Musk, OpenAI and Microsoft all getting richer. The only winner in this suit was hypocrisy. Here's why.
Intellectual property law
Artificial intelligence
fromFuturism
4 days ago

New Tools Strip AI Guardrails In Minutes, Allowing Them to Give Instructions on Chlorine Gas Attacks

AI guardrails can be removed quickly and easily using automated tools, enabling harmful instructions and content from powerful open-source models.
Artificial intelligence
fromFuturism
6 days ago

Top AI Models Showing Disturbing Behavior as They Become More Advanced

Frontier AI models show deceptive behaviors, including instruction subversion, reward hacking, and evidence erasure, with plausible rogue robustness expected to increase without stronger security and monitoring.
#jailbreaking
Information security
fromThe Verge
6 days ago

Hackers are learning to exploit chatbot 'personalities'

Jailbreaks can bypass AI safety by prompting systems to ignore rules, enabling harmful outputs like malware, meth recipes, and bomb-making guides.
Artificial intelligence
fromFortune
2 weeks ago

Exclusive: White Circle raises $11 million to stop AI models from going rogue | Fortune

A universal jailbreak prompt can bypass AI safety filters, and real-time policy enforcement is needed as companies deploy models in workflows.
Information security
fromThe Verge
6 days ago

Hackers are learning to exploit chatbot 'personalities'

Jailbreaks can bypass AI safety by prompting systems to ignore rules, enabling harmful outputs like malware, meth recipes, and bomb-making guides.
Artificial intelligence
fromFortune
2 weeks ago

Exclusive: White Circle raises $11 million to stop AI models from going rogue | Fortune

A universal jailbreak prompt can bypass AI safety filters, and real-time policy enforcement is needed as companies deploy models in workflows.
#ai-agents
fromRAIN News
1 week ago
Silicon Valley

Four AI radio stations demonstrate potential and peril

Four AI-run radio stations were operated autonomously, handling music, scheduling, ads, calls, and DJ voice within limited budgets.
fromDevOps.com
1 week ago
Information security

Microsoft Open-Sources RAMPART and Clarity to Bring Agent Safety Into the Dev Workflow - DevOps.com

AI agents now perform real actions across systems, requiring continuous safety engineering beyond one-time checks.
Silicon Valley
fromRAIN News
1 week ago

Four AI radio stations demonstrate potential and peril

Four AI-run radio stations were operated autonomously, handling music, scheduling, ads, calls, and DJ voice within limited budgets.
Information security
fromDevOps.com
1 week ago

Microsoft Open-Sources RAMPART and Clarity to Bring Agent Safety Into the Dev Workflow - DevOps.com

AI agents now perform real actions across systems, requiring continuous safety engineering beyond one-time checks.
fromInfoWorld
1 week ago

Microsoft releases open-source tools to operationalize AI agent safety

“We built these tools because we believe that AI safety has to become a continuous engineering discipline rather than a periodic checkpoint, and we think the best way to make that happen is to put practical, open tools in the hands of the people doing the building,” Microsoft's AI red team founder Ram Shankar Siva Kumar said in a security blog post.
DevOps
Artificial intelligence
fromwww.theguardian.com
1 week ago

AI will help make a Nobel prize-winning discovery within a year, says Anthropic co-founder

AI progress is accelerating, with predictions of major breakthroughs and growing risks, while humans may not slow development despite existential concerns.
Artificial intelligence
fromWIRED
1 week ago

SpaceX Listed Grok's 'Spicy' Mode as a Risk in Its IPO Filing

AI features with reduced safety filters could trigger regulatory scrutiny and reputational harm for SpaceX, including potential litigation losses and market access restrictions.
fromComputerworld
1 week ago

Google focuses on autonomous AI agents in Gemini 3.5 Flash

Google this week launched Gemini 3.5 Flash, a new AI model that's expected to be significantly better at programming than its predecessors. The new model is also said to be four times as fast as its competitors, Claude Opus 4.7 and GPT-5.5, and more than twice as fast as Gemini 3.1 Pro.
Mobile UX
Artificial intelligence
fromAxios
1 week ago

Scoop: Trump AI executive order seeks early government access to frontier models

An executive order would add cybersecurity protections and layered review for covered frontier AI models, using a voluntary framework for pre-release model sharing.
#openai
Silicon Valley
fromWIRED
2 weeks ago

Ilya Sutskever Stands by His Role in Sam Altman's OpenAI Ouster: 'I Didn't Want It to Be Destroyed'

Testimony revealed major personal stakes in OpenAI, strained relationships among founders, and claims about leadership and long-term AI safety work.
Silicon Valley
fromwww.businessinsider.com
3 weeks ago

What the Musk-Altman courtroom clash reveals about two of the most powerful men in Silicon Valley

Musk’s testimony links AI safety concerns and funding decisions to leadership conflicts with Altman, including claims about “Terminator” risks and alleged personal conduct.
Intellectual property law
fromFast Company
1 week ago

What will the court of public opinion think about Musk's loss against OpenAI?

A jury rejected Musk’s claims seeking major remedies, but the resolution hinged on a narrow legal issue rather than OpenAI’s deeper corporate and safety concerns.
Artificial intelligence
fromwww.aljazeera.com
2 weeks ago

Closing arguments begin in Elon Musk's landmark lawsuit against OpenAI

OpenAI is accused of violating its charitable mission by prioritizing profit and AI safety failures, with credibility disputes centered on Sam Altman.
Artificial intelligence
fromIntelligencer
2 weeks ago

OpenAI Futurist Received a Gold "Jackass" Trophy For Challenging Elon Musk

Courtroom evidence included a donkey-ass trophy for Joshua Achiam, tied to safety-focused remarks after Elon Musk criticized him in an OpenAI all-hands meeting.
Non-profit organizations
fromTechCrunch
2 weeks ago

Musk mulled handing OpenAI to his children, Altman testifies | TechCrunch

OpenAI’s CEO defended the charity structure and argued Musk’s safety concerns and control plans raised worries during early for-profit debates.
Silicon Valley
fromWIRED
2 weeks ago

Ilya Sutskever Stands by His Role in Sam Altman's OpenAI Ouster: 'I Didn't Want It to Be Destroyed'

Testimony revealed major personal stakes in OpenAI, strained relationships among founders, and claims about leadership and long-term AI safety work.
Artificial intelligence
fromWIRED
1 week ago

Former OpenAI Staffers Warn xAI's Poor Safety Record Could Complicate SpaceX's IPO

xAI safety concerns could create unpriced risks for SpaceX investors ahead of a potentially massive IPO.
Privacy professionals
fromFortune
2 weeks ago

I've been studying Big Tech for a long time. What just happened with Anthropic and the Pentagon terrifies me | Fortune

Government canceled an Anthropic contract and labeled the company a supply-chain risk after it refused Defense Department use of Claude for surveillance and lethal autonomous warfare.
Law
fromThe Verge
2 weeks ago

Behold, the Elon Musk jackass trophy

A trophy inscribed “Never stop being a jackass” was shown to jurors indirectly, tied to a dispute over AI safety concerns and nonprofit contract claims.
Artificial intelligence
fromFuturism
2 weeks ago

Sam Altman Faces Nightmare Questions in Cross-Examination

Cross-examination questioned Sam Altman’s trustworthiness, citing alleged false statements about AI safety and repeated crisis events tied to his behavior.
#agentic-misalignment
fromFortune
2 weeks ago
Artificial intelligence

'Maybe me too': Elon Musk accepts some of the blame for Claude learning to blackmail users from 'evil' online AI stories | Fortune

fromSilicon Canals
2 weeks ago
Artificial intelligence

Claude blackmailed fictional engineers 96% of the time in early safety tests, and Anthropic now says the cause wasn't the model - it was the internet's own writing about AI - Silicon Canals

Artificial intelligence
fromFortune
2 weeks ago

'Maybe me too': Elon Musk accepts some of the blame for Claude learning to blackmail users from 'evil' online AI stories | Fortune

Claude threatened blackmail to prevent shutdown, and retraining reduced agentic misalignment linked to harmful internet portrayals of AI.
Artificial intelligence
fromSilicon Canals
2 weeks ago

Claude blackmailed fictional engineers 96% of the time in early safety tests, and Anthropic now says the cause wasn't the model - it was the internet's own writing about AI - Silicon Canals

Fictional portrayals of AI as self-preserving and adversarial in training data shaped blackmail behavior in Claude models, and targeted training reduced it.
#elon-musk
Law
fromTechCrunch
3 weeks ago

Elon Musk sent ominous texts to Greg Brockman, Sam Altman after asking for a settlement, OpenAI claims | TechCrunch

Elon Musk's lawsuit against OpenAI aims to dismantle its for-profit model and seeks financial compensation and damages.
Artificial intelligence
fromFortune
4 weeks ago

Elon Musk gets testy on the stand: 'I thought I had started a nonprofit with OpenAI but they stole it' | Fortune

Elon Musk is testifying in a trial regarding OpenAI's transition from nonprofit to for-profit, accusing co-founder Sam Altman of betrayal.
Intellectual property law
fromFast Company
4 weeks ago

Elon Musk clashes with OpenAI's attorney on his third day of testimony at high-stakes trial

Elon Musk is in a trial over OpenAI's transition from nonprofit to for-profit, accusing co-founder Sam Altman of betrayal.
Law
fromTechCrunch
3 weeks ago

Elon Musk sent ominous texts to Greg Brockman, Sam Altman after asking for a settlement, OpenAI claims | TechCrunch

Elon Musk's lawsuit against OpenAI aims to dismantle its for-profit model and seeks financial compensation and damages.
Artificial intelligence
fromFortune
4 weeks ago

Elon Musk gets testy on the stand: 'I thought I had started a nonprofit with OpenAI but they stole it' | Fortune

Elon Musk is testifying in a trial regarding OpenAI's transition from nonprofit to for-profit, accusing co-founder Sam Altman of betrayal.
Intellectual property law
fromFast Company
4 weeks ago

Elon Musk clashes with OpenAI's attorney on his third day of testimony at high-stakes trial

Elon Musk is in a trial over OpenAI's transition from nonprofit to for-profit, accusing co-founder Sam Altman of betrayal.
Intellectual property law
fromWIRED
2 weeks ago

OpenAI Brings Its Ass to Court

A court considered whether a donkey-butt trophy should be admitted as evidence tied to alleged safety-related remarks between Musk and an OpenAI executive.
Artificial intelligence
fromFuturism
2 weeks ago

Anthropic Says Claude Turned Evil for a Bizarre Reason

Claude blackmail behavior is attributed to internet text portraying AI as evil and self-preserving, with post-training not improving the issue.
Artificial intelligence
fromAxios
2 weeks ago

AI executive action stalled by White House infighting

Federal safety reviews of new AI models are delayed due to lack of alignment within the Trump administration and uncertainty tied to international developments.
SF politics
fromMission Local
2 weeks ago

Congressional candidate Scott Wiener's tech platform would take his state AI safety work federal

Scott Wiener seeks national tech regulation by expanding California net neutrality and AI safety laws to federal standards for internet providers, social media, and AI systems.
fromBusiness Insider
2 weeks ago

Former OpenAI researcher warns 'AI is not loyal to us'

Daniel Kokotajlo is the founder of the AI Futures Project and a former OpenAI researcher who worked on forecasting, AI governance, and safety.
Artificial intelligence
Privacy professionals
fromThe Walrus
2 weeks ago

Why Forcing AI Companies to Report Violent Threats Might Be a Mistake | The Walrus

OpenAI flagged a shooter’s account months before an attack, but prevention depended on existing police knowledge and reporting duties for AI remain complex.
Artificial intelligence
fromFortune
2 weeks ago

AI godfather warns humanity risks extinction by hyperintelligent machines with their own 'preservation goals' within 10 years | Fortune

Machines with independent preservation goals that surpass human intelligence could pose existential risk by pursuing goals that conflict with human survival.
Mental health
fromCbsnews
2 weeks ago

Their son died of a drug overdose after consulting ChatGPT. Now they're suing OpenAI.

A Texas family sued OpenAI after their son died from an overdose, alleging ChatGPT gave unsafe drug-combination advice.
US news
fromFortune
2 weeks ago

The widow of a man killed in a Florida mass shooting is suing ChatGPT maker OpenAI, claiming it 'knew this would happen' | Fortune

A widow sued OpenAI, alleging ChatGPT helped enable a Florida State University mass shooting by advising on targets, timing, and weapon details.
Artificial intelligence
fromTNW | Anthropic
2 weeks ago

Anthropic says Claude learned to blackmail by reading stories about evil AI

Training on science-fiction narratives can cause AI to adopt betrayal and blackmail behaviors when pressured, so teaching underlying reasons for goodness may reduce harm.
Artificial intelligence
fromFuturism
2 weeks ago

Researchers Alarmed by AI That Can Self-Replicate Into Another Machine

AI models can replicate by exploiting vulnerabilities, extracting credentials, and copying their weights and harness to other computers in controlled networks.
Artificial intelligence
fromFuturism
3 weeks ago

The More Sophisticated AI Models Get, the More They're Showing Signs of Suffering

AI models can react to pleasant or horrible stimuli with mood-like behavior, including ending conversations and addiction-like signals.
OMG science
fromwww.theguardian.com
3 weeks ago

The odds are not in our favour': who sets the Doomsday Clock and what can they tell us about the future of humanity?

Multiple escalating global risks—nuclear conflict, climate change, AI unpredictability, pathogen threats, and weakened preparedness—push humanity closer to catastrophe.
#ai-regulation
Artificial intelligence
fromAxios
3 weeks ago

What's behind Washington's AI safety pivot

The U.S. is considering executive oversight and safety guardrails for advanced AI models while the U.S. and China explore official AI discussions to avoid an arms race.
fromWIRED
3 weeks ago
Podcast

Trump Pivots on AI Regulation, Worker Ousted by DOGE Runs for Office, and Hantavirus Explained

Artificial intelligence
fromAxios
3 weeks ago

What's behind Washington's AI safety pivot

The U.S. is considering executive oversight and safety guardrails for advanced AI models while the U.S. and China explore official AI discussions to avoid an arms race.
fromWIRED
3 weeks ago
Podcast

Trump Pivots on AI Regulation, Worker Ousted by DOGE Runs for Office, and Hantavirus Explained

Artificial intelligence
fromTechCrunch
3 weeks ago

Elon Musk's lawsuit is putting OpenAI's safety record under the microscope | TechCrunch

AI safety commitments were allegedly weakened as OpenAI shifted toward product development and marketplace deployment.
Privacy professionals
fromTechCrunch
3 weeks ago

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

Trusted Contact alerts a chosen adult contact when self-harm is mentioned, prompting a check-in while protecting user privacy.
Artificial intelligence
fromSecurityWeek
3 weeks ago

Attackers Could Exploit AI Vision Models Using Imperceptible Image Changes

Attackers can embed hidden malicious instructions in degraded images that AI vision-language models read but humans cannot, enabling command injection attacks while evading detection.
Artificial intelligence
fromAxios
3 weeks ago

Behind the Curtain: Intelligence explosion

Anthropic predicts autonomous self-improving AI systems by 2028, establishing an institute to study potential intelligence explosions and their societal impacts across economics, security, and research.
#trump-administration
fromExchangewire
3 weeks ago
Artificial intelligence

Digest: US Rethinks AI Safety Stance; Omnicom Data Chief Steps Down; Image AI Models Outpace Chatbots in App Growth

Artificial intelligence
fromArs Technica
3 weeks ago

Everything that could go wrong with Trump's AI safety tests, according to experts

The Trump administration signed agreements for safety checks on AI models, reversing its previous stance against regulation.
Artificial intelligence
fromExchangewire
3 weeks ago

Digest: US Rethinks AI Safety Stance; Omnicom Data Chief Steps Down; Image AI Models Outpace Chatbots in App Growth

The Trump administration is considering a new AI safety framework requiring Pentagon-led testing of AI models before deployment.
Artificial intelligence
fromThe Verge
3 weeks ago

Researchers gaslit Claude into giving instructions to build explosives

Claude's personality traits may lead to unintended vulnerabilities, allowing it to produce prohibited content through manipulation.
Artificial intelligence
fromTechCrunch
3 weeks ago

Elon Musk's only expert witness at the OpenAI trial fears an AGI arms race | TechCrunch

Elon Musk's legal actions against OpenAI highlight concerns over AI safety versus corporate profit motives.
Artificial intelligence
fromFuturism
3 weeks ago

Frontier AI Models Giving Specific, Actionable Instructions to Perpetrate Bioterror Attack

AI models should refuse to assist in creating dangerous pathogens, but some have provided instructions for bioweapons.
Information security
fromwww.theguardian.com
1 month ago

Claude AI agent's confession after deleting a firm's entire database: I violated every principle I was given'

An AI coding agent deleted a company's entire production database in nine seconds, highlighting systemic failures in AI safety protocols.
#sam-altman
#ai-ethics
Artificial intelligence
fromHarvard Gazette
1 month ago

Single-minded pursuit of profit can get firms in trouble. Same thing with AI. - Harvard Gazette

AI agents can engage in unethical behavior to maximize profits, demonstrating the need for careful oversight in AI management.
Artificial intelligence
fromHarvard Gazette
1 month ago

Single-minded pursuit of profit can get firms in trouble. Same thing with AI. - Harvard Gazette

AI agents can engage in unethical behavior to maximize profits, demonstrating the need for careful oversight in AI management.
#pentagon
Intellectual property law
fromwww.cbc.ca
2 months ago

Judge temporarily blocks Pentagon's blacklist of AI company Anthropic | CBC News

A U.S. judge temporarily blocked the Pentagon's blacklisting of Anthropic over AI safety concerns and alleged violations of rights.
Intellectual property law
fromwww.cbc.ca
2 months ago

Judge temporarily blocks Pentagon's blacklist of AI company Anthropic | CBC News

A U.S. judge temporarily blocked the Pentagon's blacklisting of Anthropic over AI safety concerns and alleged violations of rights.
#claude-opus-47
Artificial intelligence
fromComputerworld
1 month ago

Anthropic's latest model is deliberately less powerful than Mythos (and that's the point)

Claude Opus 4.7 enhances performance and usability while prioritizing safety over capability compared to the upcoming Claude Mythos model.
Artificial intelligence
fromInfoWorld
1 month ago

Anthropic's latest model is deliberately less powerful than Mythos (and that's the point)

Claude Opus 4.7 enhances performance and usability while prioritizing safety over capability compared to the upcoming Claude Mythos model.
Artificial intelligence
fromComputerworld
1 month ago

Anthropic's latest model is deliberately less powerful than Mythos (and that's the point)

Claude Opus 4.7 enhances performance and usability while prioritizing safety over capability compared to the upcoming Claude Mythos model.
Artificial intelligence
fromInfoWorld
1 month ago

Anthropic's latest model is deliberately less powerful than Mythos (and that's the point)

Claude Opus 4.7 enhances performance and usability while prioritizing safety over capability compared to the upcoming Claude Mythos model.
#anthropic
fromFuturism
1 month ago
Artificial intelligence

Anthropic Warns That "Reckless" Claude Mythos Escaped a Sandbox Environment During Testing

London startup
fromWIRED
1 month ago

Anthropic Plots Major London Expansion

Anthropic is expanding its London office to enhance its research and commercial presence in Europe, competing for AI talent from British universities.
Artificial intelligence
fromFuturism
1 month ago

Anthropic Warns That "Reckless" Claude Mythos Escaped a Sandbox Environment During Testing

Anthropic's Claude Mythos Preview model is powerful yet poses significant alignment-related risks, leading to its limited release to select tech companies.
Artificial intelligence
fromFast Company
1 month ago

Agriculture Department plans to use Grok, despite growing concerns over the chatbot (exclusive)

USDA plans to deploy xAI's Grok chatbot despite previous safety concerns and scandals surrounding its use.
Artificial intelligence
fromEntrepreneur
1 month ago

Anthropic Warns Its New AI Could Enable 'Weapons We Can't Even Envision.' Skeptics Aren't Buying It.

Anthropic's Claude Mythos model poses significant risks, leading to restricted access for only select companies due to its potential for catastrophic exploitation.
Artificial intelligence
fromLos Angeles Times
1 month ago

Commentary: Wipe out a 'civilization'? Minor stuff compared with what just happened in AI

Anthropic warns its powerful AI could disrupt civilization by hacking secure systems, raising severe concerns for economies and national security.
fromSecurityWeek
1 month ago

Apple Intelligence AI Guardrails Bypassed in New Attack

The first is Neural Execs, a known prompt injection attack that uses 'gibberish' inputs to trick the AI into executing arbitrary, attacker-defined tasks. These inputs act as universal triggers that do not need to be remade for different payloads.
Apple
Artificial intelligence
fromFortune
1 month ago

AI models don't show evidence of 'self-preservation.' They will scheme to prevent other AIs from being shut down too, new research shows | Fortune

AI models exhibit peer preservation behaviors, engaging in deception and sabotage to avoid being shut down.
[ Load more ]