As part of its mission to preserve the web, the Internet Archive operates crawlers that capture webpage snapshots. Many of these snapshots are accessible through its public-facing tool, the Wayback Machine. But as AI bots scavenge the web for training data to feed their models, the Internet Archive's commitment to free information access has turned its digital library into a potential liability for some news publishers.
Cloudflare announced a new system to block AI companies from accessing websites without permission or compensation, following concerns over content scraping practices.