I'm currently working on a trading platform where users can interact with a chatbot to develop their trading strategies. Here's how it currently functions: users chat with the bot to generate a strategy, which the bot then converts into code. This code is managed by a FastAPI backend that saves it in PostgreSQL (on Supabase). Each strategy runs within its own Docker container, which fetches price data and checks for signals every 10 seconds, while also updating profit/loss (PNL) data in the same interval. Trades are executed as signals occur.
The challenge I'm facing is supporting over 1000 concurrent users, each possibly running 2 strategies, leading to more than 2000 containers, which isn't feasible. I am now fully utilizing AWS.
I'm considering moving to a multi-tenant architecture where one container can handle multiple user strategies (potentially 50-100 per container, depending on complexity) and containers can scale up based on demand.
I still need to figure out a few things: 1) how to efficiently start/stop individual strategies, possibly using an event-driven approach; 2) how to update the database with the latest price and PNL without overloading it, since previously each container was performing updates in parallel every 10 seconds; 3) whether this architecture is appropriate for handling 1000+ users; 4) if PostgreSQL LISTEN/NOTIFY can cope at this scale; 5) how to change to an acceptable batching strategy; and 6) what AWS services would be most suitable in this context.
2 Answers
Transitioning to a multi-tenant architecture sounds like a smart move! By consolidating user strategies into fewer Docker containers, you'll simplify scaling. ECS should be great for this, as it can automatically upscale when CPU usage hits a certain threshold. Just keep in mind that fetching and updating your price data every 10 seconds might become a bottleneck, especially as your user base grows. Have you considered switching to something like DynamoDB for pricing data? It could handle real-time updates better and avoid the potential locks you might run into with PostgreSQL. I'm curious about how you're currently using the PostgreSQL LISTEN/NOTIFY feature. Are you finding it provides timely updates without overwhelming your system?
For better scalability, you might want to think about setting up a Notebook Executor microservice instead of running each strategy in its own container. This way, you could process requests through an API or a queue, run the strategies in isolated environments like Kubernetes, and return the results back smoothly. This approach centralizes control and scales based on your executor's processing power. It could really streamline the workload and give you room to manage resources better. If you’d like to delve deeper into this setup, I’d be happy to help!
That sounds like a solid idea! A centralized execution system could really simplify things.
I agree, DynamoDB could help a lot! Just manage your reads/writes carefully to avoid throttling.