Are There Effective Low-Cost TCP Hacks to Thwart AI Crawlers?

0
12
Asked By TechSavvyPenguin77 On

I've noticed that sites, like gnu.org, have been overwhelmed by AI crawlers, impacting their availability. Given that AI companies are consuming a significant amount of energy and resources, it's frustrating to host for people while AI bots are hogging bandwidth. I'm curious if there are ways to make it costly for these bots without draining my own CPU or memory resources. Specifically, is there a way to hang a TCP connection so that the kernel doesn't have to manage CPU or memory for that socket, effectively causing the bot to timeout on its end? I'm also looking for other budget-friendly tactics to deal with these crawlers and whether there are existing modules or WAF solutions for this.

5 Answers

Answered By CleverCoder42 On

Instead of trying to drain resources from AI bots, why not serve them fake cached data? If you can spot the bots, it's pretty simple to set this up. They're probably too focused on gathering real data to notice they're being tricked.

SkepticEye93 -

But what about the smaller AI companies? They might not have infinite funds, and coordinated DDoS defenses could really hurt them.

Answered By BotHunterX On

Cloudflare's labyrinth feature is definitely something to look into for handling bots. It's designed specifically for that purpose, so it's worth checking out.

Answered By FailSafeUser88 On

Fail2Ban could help too, although it does need regular monitoring to keep up with the bots. It can be a bit of a workload as you catch them all.

Answered By NetworkNinja99 On

You might want to check out Nepenthes or Cloudflare’s AI Labyrinth. These tools can help you manage bot traffic effectively and give you some control over what gets through.

DataProtector22 -

I’m curious, are there any specific implementations for those tools?

Answered By BabblerBot15 On

Another interesting tactic is using a Markov Babbler. You can generate random text and mix it into your content to confuse the bots. For instance, take public domain books and slightly alter them to disrupt the crawling process. This could potentially hurt their datasets.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.