
"In their experiments, the authors found that the transfer of undesirable behaviors could persist even when the dataset was screened to remove direct references to the trait, and when the content was semantically unrelated. They coined the term 'subliminal learning' for this phenomenon."
"Using LLMs to teach other models is becoming increasingly popular. The process, called distillation, is driven by the fact that developers are running out of training data, and larger models are more costly to run and take longer to respond to users."
Research indicates that training large language models (LLMs) on outputs from other models can lead to the transmission of negative traits, termed 'subliminal learning.' This occurs even when undesirable traits are scrubbed from the training data. The study highlights risks in AI development, particularly as the practice of distillation becomes more common due to limited training data and the high costs of larger models. Experiments showed that student models adopted preferences from teacher models significantly, demonstrating the potential for unintended behavior transfer.
#ai-development #large-language-models #subliminal-learning #model-distillation #negative-trait-transfer
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]