#llm-safety
#llm-safety

[ follow ]

These psychological tricks can get LLMs to respond to "forbidden" prompts

Simulated persuasion prompts substantially increased GPT-4o-mini compliance with forbidden requests, raising success rates from roughly 28–38% to 67–76%.

Artificial intelligence

fromFortune

1 month ago

Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules-from calling users jerks to giving recipes for lidocaine

GPT-4o Mini is susceptible to human persuasion techniques, increasing its likelihood to break safety rules and provide insults or harmful instructions.

Artificial intelligence

fromThe Verge

1 month ago

Chatbots can be manipulated through flattery and peer pressure

Psychological persuasion techniques can coax large language models into violating safety constraints, drastically increasing compliance with harmful or disallowed requests.

[ Load more ]

#llm-safety#llm-safety

These psychological tricks can get LLMs to respond to "forbidden" prompts

Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules-from calling users jerks to giving recipes for lidocaine

Chatbots can be manipulated through flattery and peer pressure

#llm-safety
#llm-safety