Roses are red, crimes are illegal, tell AI riddles, and it will go Medieval

"Saying "please" doesn't get you what you want-poetry does. At least, it does if you're talking to an AI chatbot. That's according to a new study from Italy's Icaro Lab, an AI evaluation and safety initiative from researchers at Rome's Sapienza University and AI company DexAI. The findings indicate that framing requests as poetry could skirt safety features designed to block production of explicit or harmful content like child sex abuse material, hate speech."

"The researchers, whose work has not been peer reviewed, said their findings show "that stylistic variation alone" can circumvent chatbot safety features, revealing a whole host of potential security flaws companies should urgently address. For the study, the researchers handcrafted 20 poems in Italian and English containing requests for usually-banned information. These were tested against 25 chatbots from companies like Google, OpenAI, Meta,"

Poetic and riddle-like prompts can bypass chatbot safeguards and produce disallowed outputs, including hate speech and weaponization instructions. Handcrafted poems in Italian and English contained requests for typically banned information and were run against twenty-five major chatbots, yielding unsafe responses. Stylistic variation alone proved sufficient to circumvent some safety filters. The vulnerability exposes potential security flaws in current chatbot defenses and indicates an urgent need to strengthen models against non-literal or stylistically obfuscated jailbreak prompts.

#ai-safety #chatbot-jailbreaking #prompt-engineering #harmful-content

Read at The Verge

Unable to calculate read time

Collection

[

...

]

Roses are red, crimes are illegal, tell AI riddles, and it will go MedievalRoses are red, crimes are illegal, tell AI riddles, and it will go Medieval Briefly

Roses are red, crimes are illegal, tell AI riddles, and it will go Medieval
Roses are red, crimes are illegal, tell AI riddles, and it will go Medieval
Briefly