Who's Harry Potter? Approximate Unlearning in LLMs: Description of our technique | HackerNoon
Briefly

One of the first ideas for how to unlearn a corpus of text is to train on the text while negating the loss function, but this approach does not yield promising results in practice, and can lead to unintended consequences such as the model unlearning the meaning of words instead of the desired content.
The ability to predict tokens in text does not necessarily indicate knowledge of specific content; it often reflects a general understanding of language. Simply reversing loss functions for unlearning specific content can be ineffective and counterproductive in certain contexts.
Read at Hackernoon
[
|
]