Who's Harry Potter? Approximate Unlearning in LLMs: Description of our technique

from Hackernoon 9 months ago

One of the first ideas for how to unlearn a corpus of text is to train on the text while negating the loss function, but this approach does not yield promising results in practice, and can lead to unintended consequences such as the model unlearning the meaning of words instead of the desired content.
Hackernoonhttps://hackernoon.com/whos-harry-potter-approximate-unlearning-in-llms-description-of-our-technique

The ability to predict tokens in text does not necessarily indicate knowledge of specific content; it often reflects a general understanding of language. Simply reversing loss functions for unlearning specific content can be ineffective and counterproductive in certain contexts.
Hackernoonhttps://hackernoon.com/whos-harry-potter-approximate-unlearning-in-llms-description-of-our-technique

Read at Hackernoon

#generative-language-model #unlearning-techniques #loss-function

Collection

[

...

]

Who's Harry Potter? Approximate Unlearning in LLMs: Description of our technique | HackerNoonWho's Harry Potter? Approximate Unlearning in LLMs: Description of our technique | HackerNoon Briefly

Who's Harry Potter? Approximate Unlearning in LLMs: Description of our technique | HackerNoon
Who's Harry Potter? Approximate Unlearning in LLMs: Description of our technique | HackerNoon
Briefly