AI scraping has become its own media business
Briefly

AI scraping has become its own media business
"Scraping content without permission may be detestable, but if the party doing the scraping isn't doing anything with it that would compete with the content creator, it's difficult to prove harm. And many legal proceedings, especially civil claims, depend on showing the actions were harmful."
"A judge later dismissed several of the authors' claims because the lawsuit didn't identify specific outputs that were direct copies. It turns out just pointing out that a large language model (LLM) was trained on your material isn't enough-you have to show it's creating outputs that take business away from you."
"Copyright lawsuits like the Silverman case often depend on showing specific instances of scraping and reproduction. The problem is, much of this activity is in the realm of bots: scraping done quickly, silently, and at scale. And while the outputs of big, public-facing AI services like ChatGPT, Gemini, and Perplexity are there for everyone to see, there's a whole shadow industry of mass AI scraping that isn't."
"At least 21 companies, several funded to the tune of hundreds of millions of dollars, routinely scrape publisher content without paying for it, and sell their &qu"
Copyright disputes between media companies and AI firms often hinge on whether AI outputs cause demonstrable harm. Scraping without permission can be wrong, but plaintiffs must show the scraping leads to outputs that compete with the creator’s business. Early rulings show that training on a work is not enough; lawsuits must identify specific outputs that are direct copies and that take value away from the rights holder. Many scraping activities occur through bots at scale and in ways that are difficult to trace. Public AI services produce visible outputs, while a shadow market of mass scraping and data reselling operates more covertly. Reports describe numerous companies using third-party sources to scrape publisher content without payment.
Read at Fast Company
Unable to calculate read time
[
|
]