I need to protect a client's website from AI scraping bots. I initially considered using Cloudflare, but they're currently facing some issues. I'm looking for alternative systems or methods that can effectively block scraping. Any suggestions would be greatly appreciated!
4 Answers
You can control bot access using your robots.txt file by specifying 'Disallow' for certain bots. Also, consider blocking specific user agents at the web server level. However, just a heads up, robots.txt isn’t foolproof as many don’t follow it.
Depending on your hosting setup, you may have additional services available. For instance, AWS offers a Web Application Firewall with bot control features that can help manage scrapers effectively.
The effectiveness of your web application firewall really depends on your budget. There’s a service called Imperva that’s highly regarded, but it can be quite pricey—around £10,000 a month!
I set up a rewrite rule on my Windows servers that blocks all unknown user agents while whitelisting legitimate search bots and crawlers. This has helped reduce unwanted traffic and bandwidth usage.

Related Questions
How to Build a Custom GPT Journalist That Posts Directly to WordPress
Cloudflare Origin SSL Certificate Setup Guide
How To Effectively Monetize A Site With Ads