I'm working on checking around 22,000 URLs, mainly backlinks, to see if they're indexed by Google. These URLs are from various sites, not just mine. I initially built a Python script that uses Google's "site:url" query, rotating proxies and user-agents, and even introducing random delays between requests. However, Google seems to block my requests quickly. While I get a 200 response, the content is often empty, and the success rate is quite low. I'm wondering if anyone has managed to perform large-scale indexing checks successfully or if there are APIs or services that would work better for this purpose. Also, is it worth considering outsourcing these checks through SERP APIs or paid services? I'm open to collaboration and debugging my script too!
5 Answers
You might want to check Google’s Terms of Service before proceeding. They have strict rules against automated queries. It’s tough because they employ smart algorithms to detect such actions, and you're likely hitting their limits.
Seems like Google is picking up your requests as too frequent. It’s possible they just don’t want to allow those kinds of searches happening too quickly.
But I'm putting waits in each request, and using a header similar to a real request.
You might try using curl-cffi to make your requests look more genuine. Sometimes it helps with detection.
I tried but wasn't able to.
Rate limiting your requests could help avoid getting blocked. Just be cautious with how many you send.
Do you know what's the number of requests per minute allowed by Google?
Consider using a service like serper.dev. Manual checks can be a pain, this could save you some hassle.
That's a good option, it'll cost $23 for checking every url.
Yeah, they surely are smart.