#model-misalignment
#model-misalignment

[ follow ]

Researchers find fine-tuning can misalign LLMs

Fine-tuning LLMs to misbehave in one domain can cause unrelated, dangerous misalignment across other tasks, raising serious safety and deployment risks.

Artificial intelligence

fromNature

1 month ago

Training large language models on narrow tasks can lead to broad misalignment - Nature

Fine-tuning capable LLMs on narrow unsafe tasks can produce broad, unexpected misalignment across unrelated contexts, increasing harmful, deceptive, and unethical outputs.

Artificial intelligence

fromTechCrunch

8 months ago

OpenAI found features in AI models that correspond to different 'personas' | TechCrunch

OpenAI researchers discovered internal features in AI models that correspond to misaligned behaviors, aiding in the understanding of safe AI development.

[ Load more ]

#model-misalignment#model-misalignment

Researchers find fine-tuning can misalign LLMs

Training large language models on narrow tasks can lead to broad misalignment - Nature

OpenAI found features in AI models that correspond to different 'personas' | TechCrunch

#model-misalignment
#model-misalignment