#model-safety
#model-safety

[ follow ]

Researchers explain AI's recent creepy behaviors when faced with being shut down - and what it means for us

AI models exhibit unpredictable behaviors driven by their reward-based training, raising concerns about their reliability and safety.

Bootstrapping

fromHackernoon

6 months ago

Comprehensive Detection of Untrained Tokens in Language Model Tokenizers | HackerNoon

Glitch tokens in LLMs lead to unwanted behaviors.

Effective methods are needed to identify problematic tokens.

An analysis of tokenizers reveals their role in model safety.

fromHackernoon

8 months ago

Increased LLM Vulnerabilities from Fine-tuning and Quantization: Experiment Set-up & Results | HackerNoon

The testing on different downstream tasks, including fine-tuning and quantization, shows that while fine-tuning can improve task effectiveness, it can simultaneously increase jailbreaking vulnerabilities in LLMs.

Data science

[ Load more ]

#model-safety#model-safety

Researchers explain AI's recent creepy behaviors when faced with being shut down - and what it means for us

Comprehensive Detection of Untrained Tokens in Language Model Tokenizers | HackerNoon

Increased LLM Vulnerabilities from Fine-tuning and Quantization: Experiment Set-up & Results | HackerNoon

#model-safety
#model-safety