I'm using Gunicorn with the gthread worker type for my Flask app, and I'm curious about how to keep track of the number of concurrent requests I'm handling over time. I want to know if the number of requests exceeds the total number of workers times threads, which in my setup is 10 workers and 10 threads—so up to 100 concurrent requests. What's the best way to monitor this to determine if I need to add more threads? For context, I'm running Flask with Gunicorn, Docker, and Nginx in front, and I have metadata enabled.
1 Answer
The optimal number of threads or workers really depends on your CPU core count. As a general rule, Gunicorn recommends using 2 workers per core plus one. So if you have 2 cores, that's about 5 workers, and with 4 cores, about 9 workers. Just keep in mind that this is a baseline recommendation.

But remember, if you're focusing on I/O-bound tasks, you may actually want more threads than available CPUs. It’s okay to exceed that threshold as long as your tasks aren't heavy on CPU.