How can I use AWS Lambda to scrape pages without running into issues?

0
10
Asked By CuriousCoder2021 On

I'm building a web app where users can monitor specific URLs and get notifications via email whenever the content on those pages changes. I have some experience with AWS Lambda, and I'm planning to set up a workflow where I:

1. Store a list of URLs on a server.
2. Use a Lambda function triggered every 10 minutes to fetch this list.
3. Scrape the content from each page.
4. Send the scraped data back to my server for processing and notifying the users of any changes.

I believe this setup could work, but I'm concerned about potential problems, especially if the number of monitored pages or users increases. I'd love to hear any advice about my architecture and workflow. Does this method sound feasible? What should I consider?

10 Answers

Answered By On

Y

Answered By Keven Krok On

{

Answered By ResourcefulRex22 On

Have you checked out this project: https://github.com/dgtlmoon/changedetection.io? It might be worth looking into, especially if they have webhook capabilities that could fit your needs.

CuriousCoder2021 -

Thanks for the suggestion! I’ll look into it. Webhooks could be a great addition for my app.

Answered By On

r

Answered By On

c

Answered By CloudGuru_101 On

Consider a better architecture. You could use Step Functions to manage the process, where one Lambda fetches the page list and then triggers worker Lambdas for scraping and writing to your database. This approach can help with scaling your application.

Answered By ScrappyDev89 On

Be careful with AWS's outbound network. Websites will see a request coming from an AWS IP address, and many have anti-bot measures that might block requests coming from AWS directly. You might need a plan for handling that.

Answered By On
Answered By On

W

Answered By On

a

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.