What Python stack should I use for multi-threaded batch processing?

0
1
Asked By TechieNomad92 On

Hey everyone! I'm transitioning from a Java-based Spring Boot batch process that's capable of handling millions of users, and I'm considering making the switch to Python. Currently, our system connects to various databases, processes user queues of 100–1000 at a time across multiple threads, and works with message brokers like RabbitMQ or Kafka. Given this background, I'm looking for recommendations on the best stack or architecture to implement this in Python. I know Python can have challenges with CPU-bound multithreading but I'm aware of options like multiprocessing. If you have any insights or solutions that fit within the Python ecosystem, I'd love to hear them!

5 Answers

Answered By PythonLover33 On

You might want to avoid switching to Python unless absolutely necessary. Given that your current system is working great in Java, just refactor that code rather than porting it. Python might lead to performance issues—especially with concurrency.

OptimistCoder21 -

If there's a compelling reason to move, let's find the best approach in Python!

CuriousDev99 -

Lol, I see where you're coming from, but OP really wants to switch! Let's see if Python can work.

Answered By CodeGuru77 On

I'd recommend using Celery along with RabbitMQ for a balanced solution. It's straightforward for managing task queues and works well with Python. If your batch tasks are reasonably independent, it should be efficient enough for your use case.

Answered By BiteSizedCoder On

If you're thinking about execution environment, consider moving to a serverless solution using GCP cloud functions backed by Pub/Sub for better scalability. It might ease the migration complexities.

Answered By DataWhisperer On

If you're set on Python, consider using multiple independent processes. Each process can manage a specific batch of users and connect to the database directly. You can also implement a master node to pull data and send tasks to worker instances via RabbitMQ.

Answered By StackMaster88 On

I'm using Temporal for orchestrating workflows in Python, which is great for complex setups. It's a bit more heavyweight than Celery, but if you need robust task management and orchestration, it's worth checking out.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.