I'm looking for some real-world advice on how to approach my leadership about our current Kubernetes situation. One of my clients has various internal business applications operating on Azure that connect with on-prem data sources like Databricks, SQL Server, and Postgres. These applications mainly deal with heavy data workloads rather than user traffic, with about 1,000 internal users across all apps.
A year ago, everything was quite decentralized, with different teams managing their apps and infrastructure choices. However, a platform manager initiated a move to centralize our workloads into a few AKS clusters to enhance management and cut costs. Fast forward to now, and the current setup is chaotic. Non-production environments have a lot of wasted resources, costs are rising, and developers are being reckless because they view AKS as an unlimited resource.
Here's the main issue: while a few platform engineers grasp how to use AKS, the majority of the developers don't, which leads to deployment bottlenecks and other issues. Tasks like batch jobs, experiments, and scripts are all crammed into the same clusters, along with overly provisioned resources that just sit there.
I'm noticing that AKS has become the go-to solution for nearly every issue. From running simple scripts to one-off jobs, it seems there's no consideration for whether other solutions, like Functions or VMs, might be a better fit. I want to know: how can I effectively communicate with leadership to stop this over-engineering trend and explore more suitable alternatives to Kubernetes? What arguments or data points have worked for you?
3 Answers
It looks like the real issue isn't necessarily Kubernetes itself, but rather poor management and over-centralization. Each application has different requirements. To convince leadership, you could highlight the drawbacks like overprovisioning and the inefficiencies caused by using AKS for everything—show them why not all tasks need such heavy orchestration. Consider implementing more tailored solutions to meet specific needs instead of relying solely on Kubernetes.
You're spot on about over-centralization affecting velocity and costs. Illustrate how governance can be data-driven instead of just opinion-based. You could bring up metrics from monitoring tools to demonstrate which applications truly need the orchestration of Kubernetes versus those that can be run more cheaply and efficiently elsewhere.
From what you've described, it sounds like Kubernetes has become the default option without proper evaluation. To sway leadership, consider presenting the financial impact of inefficiencies—like those idle nodes and cost sprawl from running unnecessary workloads. Use data to show how a mix of tools can reduce costs and increase productivity.

Exactly! If they understand the different needs of each app, it could change the whole approach. Sometimes, a VM or a simple scheduler could be far more efficient.