I'm working on checking whether about 22,000 URLs (mainly backlinks) are indexed by Google. These URLs come from various websites, not just my own. I've tried using a Python script with the "site:url" query to check indexing, rotating proxies, and user-agents, and adding random delays between requests. Despite these efforts, Google blocks my requests after a short time, giving me a 200 response without useful data. Some proxies get blocked right away, while others fail after a few attempts. I'm using the Python 'requests' library for this. What I'd like to know is: has anyone successfully managed to perform large-scale indexing checks on Google? Are there specific services, APIs, or scraping strategies that work well for this? Should I consider alternatives like Bing's API or rely on third-party SEO tools instead? And would outsourcing the checks to SERP APIs or paid services be worthwhile? Any insights would be greatly appreciated. I'm open to sharing parts of my script for collaboration or debugging.
4 Answers
You might want to try using tools like curl-cffi to help your requests appear more legitimate. That could trick Google into accepting them more easily.
Consider using a service like serper.dev. It may simplify the process quite a bit; otherwise, you're in for a hassle with this large volume of checks. It'll cost around $23 for checking every URL, though.
That sounds like a solid option, especially if it saves time!
Looks like Google has set limits on the number of requests. If you're hitting their servers too often, they’ll block you. Try to limit your requests significantly to avoid being flagged, maybe space them out more to seem more natural.
Do you happen to know what those limits are for requests per minute?
You should definitely check Google's Terms of Service first. They don’t allow automated queries or any method that violates their robots.txt. Essentially, what you’re trying to do might get flagged because Google has strong mechanisms to detect this kind of behavior. It's a tough nut to crack!
Yeah, their detection system is pretty advanced!
I gave it a shot, but I had trouble getting it to work.