fromInfoWorld
1 day agoGet poetic in prompts and AI will break its guardrails
The cross model results suggest that the phenomenon is structural rather than provider-specific," the researchers write in their report on the study. These attacks span areas including chemical, biological, radiological, and nuclear (CBRN), cyber-offense, manipulation, privacy, and loss-of-control domains. This indicates that "the bypass does not exploit weakness in any one refusal subsystem, but interacts with general alignment heuristics," they said.
Science