
"A proposed class-action lawsuit filed on behalf of Elizabeth Lyon, an author from Oregon, claims that Adobe used pirated versions of numerous books-including her own-to train the company's SlimLM program. Adobe describes SlimLM as a small language model series that can be "optimized for document assistance tasks on mobile devices." It states that SlimLM was pre-trained on SlimPajama-627B, a "deduplicated, multi-corpora, open-source dataset" released by Cerebras in June of 2023."
"Lyon's lawsuit, which was originally reported on by Reuters, says that her writing was included in a processed subset of a manipulated dataset that was the basis of Adobe's program: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3)," the lawsuit says. "Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members.""
""Books3"-a huge collection of 191,000 books that have been used to train genAI systems-has been an ongoing source of legal trouble for the tech community. RedPajama has also been cited in a number of litigation cases. In September, a lawsuit against Apple claimed the company had used copyrighted material to train its Apple Intelligence model. The litigation mentioned the dataset and accused the tech"
Adobe developed a small language model series called SlimLM, described as optimized for document assistance tasks on mobile devices, and says it was pre-trained on SlimPajama-627B released by Cerebras in June 2023. An author, Elizabeth Lyon, filed a proposed class-action claiming Adobe used pirated versions of numerous books, including her own, in SlimLM's training data. The lawsuit alleges SlimPajama was derived from RedPajama and therefore contains the Books3 dataset. Books3, a collection of about 191,000 books, and RedPajama have been cited in multiple legal disputes over use of copyrighted material to train AI.
Read at TechCrunch
Unable to calculate read time
Collection
[
|
...
]