How can I stop GPTBot from endlessly scraping my site?

0
5
Asked By CuriousCoder24 On

I've noticed that GPTBot has been scraping the same page on my website non-stop for about a day now. I use a URL with a hashed return URL as a path parameter, which results in a lot of unique URLs all pointing to the same content. It seems like OpenAI hasn't implemented canonical tags yet, so GPTBot is getting stuck in a loop. I tried throttling its requests to one every three seconds, but it was still overwhelming. It's starting to feel like harassment! I'm curious how others are managing this situation.

2 Answers

Answered By TechieTinker99 On

One way you can deal with this is by blocking the GPTBot's IP or IP range on ports 80 and 443. It might help to restrict access a bit.

CasualCoder_88 -

Yeah, I thought about that too, but I have to admit that tarpitting it sounds way more fun!

Answered By DigitalGuru_17 On

At my company, we actually want to allow GPTBot since we get decent traffic from ChatGPT. We set up a CloudFlare rule to cache everything for requests that identify as GPTBot. It also removes all query parameters from the cache key. This cut our server load almost immediately! We even extended that rule to include all bots, and now our servers can handle human traffic way better!

CuriousCoder24 -

This is super helpful, thanks! I’ll definitely look into this.

Related Questions

Remove Duplicate Items From List

EAN Validator

EAN Generator

Cloudflare Cache Detector

HTTP Status Code Check

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.