How to Handle AI Scrapers like Anthropic and OpenAI?

0
6
Asked By CuriousCat42 On

I've been dealing with some serious requests from AI scrapers, particularly from Anthropic, which has been hitting my site with 10 requests per second from multiple IPs. I've blocked those IP addresses, but they keep coming back. It's not just Anthropic; OpenAI has also visited periodically but seems to request less aggressively. The issue is compounded by the fact that neither organization shares their IP ranges for scrapers, so I'm not sure if it's even them or possibly a rogue actor. I've thought about serving up a highly cached version of my website that's packed with extra content to confuse the scrapers while still conveying the main idea of the page. How are others managing these scraper issues? Simply blocking isn't sustainable, especially since these companies could replace search engines in the future.

6 Answers

Answered By LegalEagle69 On

You could include a clause in your Terms of Service that states any scraping is allowed only under a paid license. This way, if they start scraping heavily, you could send them an invoice referencing that clause. It may not stop them, but it creates a paper trail and might scare off some less serious scrapers.

DoubtingThomas -

But how enforceable are those TOS clauses? It seems like they could just ignore the invoices, right?

InvoiceBuster -

Who do you even send the invoice to, though? Seems kind of pointless if there's no structured way to reach these companies.

Answered By DigitalDefender On

Using Cloudflare (even the free version) could be a game changer. Set up rate limiting and create custom security rules to block suspicious IPs and user agents. Issuing challenges to certain locations might also help differentiate between bots and legitimate traffic.

Answered By ScraperSmasher On

It's tough to completely stop scrapers, but rather than trying to block them, focusing on rate limiting and serving lightweight cached pages can help. If you block everything, it usually leads to more issues later on, so it might be better to aim for reducing operational costs instead.

Answered By TechNinja88 On

Have you checked out Cloudflare's AI Crawl Control feature? It's designed to help manage scrapers effectively and could be useful for your situation. You might find that it minimizes the requests significantly without disrupting your site for actual visitors.

ServerGuru19 -

I use it on my static site, and it's been a lifesaver with all the crawling I've experienced!

Answered By SlyFoxDev On

What if you cleverly include iframes that send requests to their own sites whenever they hit yours? It could backfire on them and waste their resources while they scrape your content!

MischiefMaker -

I love that idea! Kind of poetic justice, isn’t it?

Answered By CleverCoder101 On

I faced a similar problem with Bytedance hitting my site hard, so I modified my web server to return an error after a certain number of requests and blacklisted IPs after just a single probe. It really helped reduce the load. You might want to explore if Anthropic is also causing issues on your end by checking the user agents they’re using.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.