Hey everyone! I'm transitioning from a Java-based Spring Boot batch process that's capable of handling millions of users, and I'm considering making the switch to Python. Currently, our system connects to various databases, processes user queues of 100–1000 at a time across multiple threads, and works with message brokers like RabbitMQ or Kafka. Given this background, I'm looking for recommendations on the best stack or architecture to implement this in Python. I know Python can have challenges with CPU-bound multithreading but I'm aware of options like multiprocessing. If you have any insights or solutions that fit within the Python ecosystem, I'd love to hear them!
5 Answers
You might want to avoid switching to Python unless absolutely necessary. Given that your current system is working great in Java, just refactor that code rather than porting it. Python might lead to performance issues—especially with concurrency.
Lol, I see where you're coming from, but OP really wants to switch! Let's see if Python can work.
I'd recommend using Celery along with RabbitMQ for a balanced solution. It's straightforward for managing task queues and works well with Python. If your batch tasks are reasonably independent, it should be efficient enough for your use case.
If you're thinking about execution environment, consider moving to a serverless solution using GCP cloud functions backed by Pub/Sub for better scalability. It might ease the migration complexities.
If you're set on Python, consider using multiple independent processes. Each process can manage a specific batch of users and connect to the database directly. You can also implement a master node to pull data and send tasks to worker instances via RabbitMQ.
I'm using Temporal for orchestrating workflows in Python, which is great for complex setups. It's a bit more heavyweight than Celery, but if you need robust task management and orchestration, it's worth checking out.
If there's a compelling reason to move, let's find the best approach in Python!