I'm working on Rhesis.ai, which tests LLM applications using a FastAPI backend along with Celery for task processing. Our workload is quite I/O-heavy as we perform numerous external API calls followed by LLM API queries, like those from OpenAI, to assess outputs.
Currently, we handle parallelization within a Celery task by utilizing asyncio. For example, if we have a test set of 50 tests, we send all requests at once and also evaluate them concurrently. However, we're limited by the number of Celery worker processes, which increases RAM usage.
We're looking for a solution that fully utilizes async capabilities—namely, a setup where tasks can be continuously scheduled onto an event loop without being restricted by a set number of worker processes or threads. FastAPI handles many concurrent requests effectively, but it lacks task queuing.
We considered Dramatiq, but it seems to have similar constraints as Celery, where tasks are executed sequentially by the workers despite being internally async. Ideally, we want a stable and mature library or architecture that can handle this proactive scheduling of tasks on an event loop without waiting for previous tasks to complete. We're wary of using experimental solutions for a core part of our infrastructure.
6 Answers
You might not even need to switch from Celery. If you use a simple monkey-patch with `gevent`, that could give you the concurrency you're after. Just start your worker like this: `celery worker -P gevent --concurrency=100`.
Have you thought about using ZeroMQ to build your own solution? It’s not overly complex; you could do it in about 50 lines of code. Just make sure to study up on task routing though, depending on how your tasks behave.
Check out Oban. I’ve used the Elixir version and really liked it. It’s backed by PostgreSQL/SQLite for infrastructure, which is solid.
Have you checked out TaskIQ? It’s basically async Celery, and it might fit your needs perfectly.
There’s also the ARQ library, created by the Pydantic guy, but just a heads-up: it's still in beta and may not be the most reliable option right now.
You might want to look into Temporal. It's designed for workflows and allows a single worker to pick up new tasks while waiting for the I/O to finish. Just keep in mind it might be a bit overkill for your use case since each request becomes a workflow.

Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically