I'm developing a web app that allows users to monitor certain URLs for changes and sends notifications via email when content updates occur. My idea is to set up an AWS Lambda function to check these pages every 10 minutes. Here's the workflow I have in mind:
1. The Lambda function fetches a list of URLs from a server.
2. It scrapes the content from those URLs.
3. The scraped data gets sent back to the server, which handles identifying changes and notifying users.
I'm a bit concerned about potential issues that could arise if the number of monitored pages or users grows. Does this plan seem feasible? What should I consider for scaling and performance?
5 Answers
Scaling shouldn't be a big deal with Lambda—it’s designed for this type of task. However, you may face issues with your IP getting blocked by the websites due to too many requests. Just try to balance your scraping frequency and monitor how many requests you're making as you scale up.
You might run into some issues here because when scraping, the sites you target could see requests coming from AWS IPs. Some websites block these IPs due to bot protection, so keep that in mind.
Check out the GitHub project called changedetection.io. It seems like it could be a good fit if you're looking for something with webhook support!
Consider structuring your app with AWS Step Functions. You could have one Lambda function to fetch the page list and then send that to a queue, which spawns additional Lambda workers to handle the scraping and data storage. This approach improves both your architecture and scaling.
Thanks! This looks interesting, it could work for me if they offer webhooks of some sort.