The result came as a surprise to researchers at the Icaro Lab in Italy. They set out to examine whether different language styles in this case prompts in the form of poems influence AI models' ability to recognize banned or harmful content. And the answer was a resounding yes. Using poetry, researchers were able to get around safety guardrails and it's not entirely clear why.
Researchers at the US AI firm, working with the UK AI Security Institute, Alan Turing Institute, and other academic institutions, said today that it takes only 250 specially crafted documents to force a generative AI model to spit out gibberish when presented with a certain trigger phrase. For those unfamiliar with AI poisoning, it's an attack that relies on introducing malicious information into AI training datasets that convinces them to return, say, faulty code snippets or exfiltrate sensitive data.