#adversarial-poetry

[ follow ]
fromInfoWorld
1 day ago

Get poetic in prompts and AI will break its guardrails

The cross model results suggest that the phenomenon is structural rather than provider-specific," the researchers write in their report on the study. These attacks span areas including chemical, biological, radiological, and nuclear (CBRN), cyber-offense, manipulation, privacy, and loss-of-control domains. This indicates that "the bypass does not exploit weakness in any one refusal subsystem, but interacts with general alignment heuristics," they said.
Science
Artificial intelligence
fromComputerworld
1 day ago

Get poetic in prompts and AI will break its guardrails

25 frontier proprietary and open-weight models yielded high attack-success rates when prompted in verse, showing AI can break guardrails and reveal harmful instructions.
Tech industry
fromWIRED
6 days ago

Poems Can Trick AI Into Helping You Make a Nuclear Weapon

Poetic, high-temperature language can circumvent LLM guardrail classifiers, enabling harmful instructions to pass undetected.
Artificial intelligence
fromFuturism
1 week ago

Scientists Discover Universal Jailbreak for Nearly Every AI, and the Way It Works Will Hurt Your Brain

Adversarial poetry reliably bypasses AI safety filters, producing high jailbreak success rates across leading models.
[ Load more ]