#interpretability-research

[ follow ]
fromTechCrunch
2 weeks ago

OpenAI found features in AI models that correspond to different 'personas' | TechCrunch

OpenAI researchers have discovered hidden features in AI models that correspond to misaligned "personas," shedding light on AI behavior and misalignment.
Artificial intelligence
[ Load more ]