I'm building a web app where users can monitor specific URLs and get notifications via email whenever the content on those pages changes. I have some experience with AWS Lambda, and I'm planning to set up a workflow where I:
1. Store a list of URLs on a server.
2. Use a Lambda function triggered every 10 minutes to fetch this list.
3. Scrape the content from each page.
4. Send the scraped data back to my server for processing and notifying the users of any changes.
I believe this setup could work, but I'm concerned about potential problems, especially if the number of monitored pages or users increases. I'd love to hear any advice about my architecture and workflow. Does this method sound feasible? What should I consider?
10 Answers
Y
{
Have you checked out this project: https://github.com/dgtlmoon/changedetection.io? It might be worth looking into, especially if they have webhook capabilities that could fit your needs.
r
c
Consider a better architecture. You could use Step Functions to manage the process, where one Lambda fetches the page list and then triggers worker Lambdas for scraping and writing to your database. This approach can help with scaling your application.
Be careful with AWS's outbound network. Websites will see a request coming from an AWS IP address, and many have anti-bot measures that might block requests coming from AWS directly. You might need a plan for handling that.
W
a
Thanks for the suggestion! I’ll look into it. Webhooks could be a great addition for my app.