Is Lambda a Good Choice for Web Scraping Without Getting Blocked?

0
2
Asked By CuriousCoder42 On

I'm looking to do some web scraping on a specific site and I already created a proof of concept that works fine with Python and Selenium on my local machine. The scraping takes about 2-3 minutes per request, and I want to scale this up without having to manually run the script multiple times. I'm considering using AWS Lambda for this purpose but I'm concerned about potential IP bans since the site I'm targeting uses Cloudflare. I've heard free proxies might not work either since they might be blocked. I also want to know how much it would cost to run multiple Lambda functions to scrape data once a day.

5 Answers

Answered By WebWizard101 On

I suggest checking out the Zyte API. It's designed specifically for web scraping and takes care of a lot of the issues you might face, plus their pricing is pretty reasonable.

Answered By DataHacker99 On

It really varies depending on the site and how they set up their bot protections. While Lambda might offer changing IPs, they’re often all categorized as data center IPs, which can still lead to roadblocks with sites that ban those ranges.

Answered By ProxyFinderX On

Using a proxy is a good idea here. Just remember that AWS Lambda comes with its own set of IPs that, more often than not, are banned on various sites. As for costs, you can use the AWS pricing calculator to get a rough estimate, but depending on your usage, you might find that the Lambda costs could fit within the free tier limits.

Answered By ScrapMaster3000 On

You're likely to get blocked since the IPs used by Lambda are often flagged as data center IPs. I've run into issues with scraping on AWS before, although not with Lambda specifically. You might want to look into residential IP services which are designed for scraping, but I'm not sure about the costs involved.

Answered By ScrapingNinja On

I've used AWS for smaller projects and it works fine for those one-off scrapes. Just make sure to adjust your headers and some settings to lower the chance of getting caught, though it's not ideal for large-scale scraping.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.