I'm currently working on a platform where users can deploy their own Python trading bots, each running in a separate Docker container. Since I have 10 users with 3 strategies each, that results in 30 containers running at the same time. I'm encountering some performance issues, especially when a user clicks to stop all their strategies, which causes significant lag as I try to shut down all their containers. Additionally, I'm fetching user balances and other information every 30 seconds, making the web interface feel sluggish. Given these challenges, what's the best way to scale this architecture for over 500 users? Should I reconsider my entire setup? Any insights from those who have experience with similar systems would be greatly appreciated! Currently, I'm using an m5.xlarge EC2 instance.
2 Answers
One solution to handle untrusted code could be to run it in AWS Lambda with restricted permissions inside a VPC. This could potentially improve performance and security.
What exactly do you mean by 'the system lags'? Typically, systems are designed to be horizontally scalable, but they may have latency during certain operations, especially with API requests.
I have buttons in the UI for starting and stopping containers. When I press the button to close, it tries to stop all containers at once. This can disrupt other requests, like fetching data, leading to instability.