Programming

What’s the best Python stack for multi-threaded batch processing?

April 22, 2025

Asked By CuriousCoder99 On April 22, 2025

Hey folks! I'm transitioning from a Java background and have a legacy Spring Boot batch process that's currently managing millions of users. We're looking at migrating this system to Python and I'm hoping to get your input on this. Here's how the current setup works:

- It connects to various databases (all major ones are supported).
- Each batch service operates on separate servers and processes a queue of about 100-1000 users at a time.
- We utilize a thread pool where each queue item is handled by a different thread.
- After processing, the tasks send messages to RabbitMQ or Kafka.

Given all this, what Python stack or architecture would you recommend for effectively managing this type of work? I know Python has its quirks with CPU-bound threads, but I've also read about solutions involving multiprocessing. I'd really appreciate your suggestions that fit within the Python ecosystem!

5 Answers

Answered By CeleryFan91 On April 24, 2025

Using Celery with RabbitMQ is popular for this kind of task! Just be aware that while it's great for straightforward jobs, its workflow orchestration capabilities can be quite basic, which might not suit complex tasks well.

Answered By SkepticalSteve On April 23, 2025

Honestly, unless you have a compelling reason, I wouldn't rush into migrating to Python. The performance drop can be significant for what you're trying to do, especially concerning multi-threading. Just a thought!

Answered By TechieTommy On April 23, 2025

I’d recommend using Celery along with RabbitMQ. Celery is great for background task processing, especially if you need to handle asynchronous workloads. Combine it with RabbitMQ for effective message queuing and you'll have a solid setup for your batch tasks.

Answered By PerformanceWatcher On April 22, 2025

I would caution against using Celery—it has its bloat. If you’re not using Redis, check out Dramatiq; it’s lightweight and performs well without all the extra features.

Answered By ProcessPro On April 22, 2025

If you can set up the workload distribution in advance among independent Python processes, you'd see better performance. You might want to have a single master node fetch data from the DB, then disperse it to worker nodes using RMQ.

What’s the best Python stack for multi-threaded batch processing?

5 Answers

Related Questions

How To: Running Codex CLI on Windows with Azure OpenAI

Set Wordpress Featured Image Using Javascript

How To Fix PHP Random Being The Same

Why no WebP Support with Wordpress

Replace Wordpress Cron With Linux Cron

Customize Yoast Canonical URL Programmatically

LEAVE A REPLY Cancel reply