Hey everyone! I'm dealing with a pretty slow code for web scraping that takes ages because I'm trying to gather a large amount of data. Since I'm new to the cloud, I'm looking for advice on which service or instance would work best to get my code running in a reasonable time. I've already tried using a t2.xlarge instance, but it still takes too long. Any suggestions?
3 Answers
Understanding what specifically is causing your performance bottlenecks is crucial. If your code is processing URLs one by one, you'll be waiting on server responses most of the time. Using parallel fetching will definitely help speed things up. Libraries for scraping specific to your programming language might also simplify this for you!
Upgrading from a t2.xlarge, which is pretty small and outdated, to something massive like a c6i.48xlarge could definitely help, but consider the jump in cost. There are various step-ups to check before going straight to a 48xlarge! Also, it’s important to make sure you’re fully utilizing the resources you have, like multi-threading or async requests. Have you looked into that?
It sounds like you're not sure what's causing the slowdown in your code. Just going for bigger hardware is usually a bad strategy. You'll want to investigate whether it's CPU, memory, storage, or network that's slowing things down. Analyze your code first, then decide on your next move to avoid wasting money on unnecessary AWS services.
Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically