Should I Use a Managed Service for Web Scraping on AWS or Build My Own?

0
12
Asked By CraftyPineapple32 On

Hey all! I'm developing a SaaS app that gathers pricing and product information from various e-commerce websites, and I'm running into typical scraping challenges like CAPTCHAs, IP bans, dynamic JavaScript content, and the hassle of managing proxy pools and browser instances. I've started experimenting with Crawlbase, which provides a scraping API that has features like proxy rotation, browser rendering, and CAPTCHA solving. It even allows for outputting data directly to S3 or through webhooks. Given that I'm working on AWS, I'm wondering if it's better to rely on a managed service like this or to set up our own scraping infrastructure using ECS/Fargate with headless Chrome and rotating proxies. If you've tackled this on AWS, how did you go about it?

4 Answers

Answered By SkepticalWizard45 On

If web scraping isn't your main business focus, I’d recommend renting or buying a service instead of building from scratch. These companies specialize in web scraping, so they’ll handle updates and bug fixes, saving you from the hassle of constant maintenance.

Answered By MarketplaceGuru33 On

Have you thought about using something from the AWS Marketplace for your web crawling needs? There are some good products listed that might fit well with your setup.

Answered By CuriousRover87 On

I'd lean towards using a managed service. It's risky to run your own crawlers; AWS has been known to close accounts if they get enough complaints about your scraping activities. If you do decide to build your own, definitely check out AWS’s prescriptive guide for web crawling—it can help you avoid common pitfalls.

Answered By CandidCat196 On

Just a heads-up, a lot of websites make it difficult to scrape because they don’t want automated access. Most will have a clause in their ToS against this, so just keep that in mind when proceeding.

InquisitiveFox90 -

That does raise an interesting point—where do all these AI models get their training data if not by breaching those ToS?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.