I'm currently facing challenges in my Go applications where I'm managing a set of worker pods that distribute tasks, like consuming from Kafka topics or handling batch jobs and pipelines. The setup usually involves:
- N workers (typically fewer than 50 Kubernetes pods)
- M work units, which correspond to topic partitions
- A need for each worker to "own" a fair portion of the workload
- Frequent worker changes due to deployments, crashes, or autoscaling
- A requirement for throttling control on the tasks assigned
I've noticed that most approaches either rely on Redis locks, a central scheduler, or some sort of queue that leads to competition among workers for tasks. These methods often result in unpredictable behavior or eventual consistency issues when one component fails. I'm curious about how others are managing this in their production environments, especially in Kubernetes setups, and what patterns are effective. I'd love to hear your insights!
5 Answers
You might be overthinking the problem. Check out the answers about Kafka; it can handle partitioning and distributing work among consumer groups natively. This should simplify your workload management.
With Kafka, you can utilize consumer groups to handle partition allocation. Each consumer in a group gets assigned different partitions, and if one consumer fails, Kafka automatically redistributes those partitions among the remaining consumers. Are you trying to achieve something different? If your workers depend on each other, you might need locks; otherwise, they should share work without issue.
One viable solution is to implement sharding where each pod watches the replicas and can dynamically adjust the allocation of tasks when the worker count changes. This can help scale the system effectively.
Consider switching to RabbitMQ if you're concerned about task ownership and locking mechanisms. It allows for better isolation since tasks can be hidden from other pods while they’re in progress. For auto-scaling, look into KEDA; it integrates smoothly with RabbitMQ.
While you're avoiding leader election, it’s a common practice, especially in Go systems. Could you explain why this approach doesn't suit your needs? For instance, you might miss out on coordination benefits that come from having a leader, especially when dealing with high throughput.

Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically