System Operations

Help! My K8s kube-api-server Keeps Getting OOM-Killed – Any Ideas?

October 20, 2025

Asked By CloudyCoder21 On October 20, 2025

Hey folks! We recently had a serious issue in our Kubernetes setup that's left us scratching our heads. We're running a somewhat odd configuration with 6 control plane nodes (not ideal, I know). Our storage solution is Longhorn, and we have various stateful apps running, including Vault, Loki, and Prometheus.

Here's the situation: three of our master nodes went down simultaneously, which rendered the whole cluster non-functional for a bit. They rebooted about 5-10 minutes later, and everything eventually came back online.

After investigating, we found that the kube-api-server process was OOM-killed on the affected nodes due to high RAM usage. Furthermore, we discovered kernel-level logs indicating significant disk and I/O errors, and an iostat check showed a super high I/O percentage.

We suspect Vault could be the culprit since it's running on the master nodes, which is usually not recommended. But curiously, the nodes that failed were not the same ones hosting the Vault pods. Given that this odd setup had been functioning okay until now, we're stumped.

Could Longhorn's heavy lifting (like replication or snapshotting) have triggered an I/O storm causing the kube-api-server to balloon in memory and get killed? Or could etcd's performance issues in high I/O situations have led to this cascading failure? Has anyone here witnessed a similar scenario?

4 Answers

Answered By DataGuru93 On October 22, 2025

First off, having 6 control plane nodes isn't common practice, as odd numbers help avoid split-brain issues. It's generally best to avoid running workloads on those nodes unless absolutely necessary. If one service hogs resources, it can lead to situations like the one you described, where critical services, like the kube-api-server, can run OOM.

TechieTom - October 23, 2025

Exactly! It's crucial to keep core services insulated from potential resource starvation.

SysAdminSally - October 23, 2025

I think high I/O could definitely be a concern. It often aligns with high memory usage when too many write operations are happening.

Answered By IOMasterFlex On October 22, 2025

Totally agree! High I/O might be the cause of your kube-api-server's memory issues, especially if it was processing large requests. Was there anything unusual about the types of requests the API servers were handling? Also, is swap enabled? Sometimes, enabling audit logs can significantly impact disk writes, which might explain the I/O spike.

QuickFixMike - October 23, 2025

+1 to that! Auditing can really cause disk writes to skyrocket.

DBAdminDan - October 23, 2025

And don't forget to check how close to the memory limit your servers usually run. Knowing this can clarify a lot.

Answered By QueryQueen On October 22, 2025

It’s also worth considering whether any services in your cluster might be creating excessive requests to the API server, almost treating it like a database. Large Custom Resource Definitions (CRDs) or additional objects can create a query storm, overwhelming etcd and the API server, which could balloon their memory use and lead to OOM kills.

DevOpsDude - October 23, 2025

Right? Services like Trivy are notorious for doing this.

Answered By StorageSavant On October 21, 2025

I'm curious about your underlying storage. I've seen SAN issues lead to similar problems, especially with etcd being sensitive to any disk interruptions, which can crash system pods. What kind of storage setup do you have? That might be a crucial aspect to look into!

Help! My K8s kube-api-server Keeps Getting OOM-Killed – Any Ideas?

4 Answers

Related Questions

Can't Load PhpMyadmin On After Server Update

Redirect www to non-www in Apache Conf

How To Check If Your SSL Cert Is SHA 1

Windows TrackPad Gestures

LEAVE A REPLY Cancel reply