Struggling with LLM Scrapers—Need Help!

0
4
Asked By CuriousExplorer42 On

I'm managing a small association's server that's built around archives and libraries, and we have a Koha installation to help users check rare books and where to borrow them. However, I've been facing significant issues with LLM scrapers hitting our service hard. A few months ago, I managed to block some with basic user-agent filtering, but recently, we faced another wave of traffic that rendered the service unavailable. My logs showed an influx of requests from various global IPs, all with strange user-agent strings. I tried using the Apache Bad Bot Blocker, but it hasn't been effective for us. It seems like these scrapers might be using personal devices to scrap our site, since I've seen over 50,000 unique user agents on a site that typically only serves a small handful of users daily. I'm looking for effective solutions aside from the typical proof of work methods, as I want to maintain usability for our non-tech-savvy users. Any advice would be greatly appreciated as I'm feeling quite desperate seeing all these bots while real users are locked out.

2 Answers

Answered By TechWhisperer99 On

Scraping bots are getting smarter and more aggressive. Have you considered implementing rate limiting with tools like Fail2Ban or ModSecurity? Those can certainly help in monitoring and blocking harmful requests automatically. Also, using Cloudflare can significantly help as it hides your server's IP address and blocks a lot of unwanted traffic right at the front.

Answered By BotBattler88 On

I suggest either going with Anubis for real-time bot management or using Cloudflare to protect your site. Unfortunately, keeping up with scraping technologies means you'll have to adjust your defenses regularly since blocking bots is a constant arms race.

UserSupporter66 -

Totally agree with this suggestion!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.