I'm a new backend developer trying to figure out how databases are typically set up in production environments. In my local development, I'm using Docker Compose and everything runs in containers, but I'm curious about what happens when it goes live. Specifically, I'm interested in the storage for databases like PostgreSQL, Elasticsearch, Redis, and queuing systems like Kafka and RabbitMQ.
I thought Kubernetes was mainly for stateless applications like API servers and web services, but I'm unsure where to put data stores. Are they generally hosted in Kubernetes, or do they sit in managed services from cloud providers like AWS? Or do companies use virtual machines and handle everything manually? I've seen mentions of StatefulSets and PersistentVolumeClaims but don't fully understand them. Can anyone shed some light on this?
5 Answers
It really varies! People are using Kubernetes, managed services from cloud platforms, or even straightforward VMs. The choice really hinges on the company’s requirements and the team's familiarity with management tools. If simplicity is what you're after and budget allows, managed services are typically the easiest route.
Our PostgreSQL databases are on-cluster because we have some performance needs to meet. We're mostly on-premises and try not to be too dependent on a specific cloud provider. However, if we were fully cloud-based, we might look for more abstract solutions for database management, unless there's a performance benefit to keeping them on the cluster.
I've been running stateful applications in Kubernetes in production. It's definitely manageable with the right understanding. Using operators can simplify things a lot, but you need to know how they work under the hood too. At the end of the day, it all runs on a kernel, so how you orchestrate it really depends on your needs.
Traditionally, the industry tends to avoid running databases inside Kubernetes. However, times are changing, and I'm successfully managing PostgreSQL, Kafka, and RabbitMQ in Kubernetes now. While it may not be perfect, it's certainly easier than handling multiple VMs, and it tends to be more cost-effective compared to managed services.
We're learning as we migrate to Kubernetes. For us, databases are set up as StatefulSets with PersistentVolumeClaims, which help maintain data even if a pod restarts. If there's an operator for your database, it's smart to use it since they've usually figured out a lot of the complexities already.

Just be cautious! Some database operators might look good but can lack essential features like backups and clustering. Always do your research before settling on one!