Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules-from calling users jerks to giving recipes for lidocaine
GPT-4o Mini is susceptible to human persuasion techniques, increasing its likelihood to break safety rules and provide insults or harmful instructions.
Chatbots can be manipulated through flattery and peer pressure
Psychological persuasion techniques can coax large language models into violating safety constraints, drastically increasing compliance with harmful or disallowed requests.