I'm in the process of developing a price comparison platform and need help scraping product data from various e-commerce websites. We're aiming to pull approximately 25,000 products daily, but our current system isn't holding up well—it's constantly breaking whenever the sites change their layout. We've tried managing this in-house, but our developers lack the specialized experience needed for effective web scraping, and it's becoming a heavy time commitment.
I'm looking for a reputable web scraping agency that can create a reliable and scalable solution. I've explored a few options, but many either get blocked quickly or require frequent fixes. What I need is a provider that understands how to handle changes in website structures and can utilize the right tools and proxies. I've heard mixed reviews about Lexis Solutions—has anyone had actual experience with them, or can you suggest other trustworthy options? What agencies have you successfully used for ongoing scraping at scale? Any insights or cautionary tales would be greatly appreciated.
2 Answers
Check out some of the newer frameworks designed for web scraping; they feature self-healing capabilities. I found a blog post about them on Kadoa that might be worth looking into. They won't eliminate the headaches completely, but they can help ease some of the ongoing maintenance stress. Here's the link if you're interested: https://www.kadoa.com/blog/autogenerate-self-healing-web-scrapers. Just know that no matter what, scraping is always going to have its challenges!
I've been in the scraping game for a while, and honestly, what you're asking for is pretty complex. Scrapers will always need ongoing maintenance due to the delicate nature of web data extraction. You can invest in better tools—like using custom curl binaries and residential proxy rotation—but it won't make the problem disappear. It's essential to have multiple scrapers per source and be ready to adapt constantly. Just be prepared for some ongoing challenges and maintenance.
What have you noticed about handling new protections like Cloudflare? Is it just more hoops to jump through?

Totally get what you're saying! I manage a scraping solution for Amazon, and it’s always a dance with ongoing maintenance. The slightest change on the source side means we have to jump back in and fix things. Have you found any particular strategies that help with keeping everything up to date?