How to Safely Scale Down Node Pools in Oracle Kubernetes Engine?

0
6
Asked By CuriousCoder42 On

I'm seeking real-world insights on managing node pool scale-down operations in Oracle Kubernetes Engine (OKE). My objective is to perform a zero-downtime Kubernetes upgrade while minimizing the risks associated with node termination.

Currently, I have a node pool with 3 nodes, and I'm planning to scale it up to 6 nodes. The strategy I've mapped out includes:
- Scaling the node pool from 3 to 6 nodes to allow workloads to reschedule onto the new nodes.
- Cordon and drain the old nodes.
- Then, scale the pool back from 6 to 3 nodes.

However, I have concerns about the scale-down behavior in OKE. In AWS EKS, I know that the oldest instances are terminated first, but I couldn't find any documentation for OKE that confirms the order of node removal when scaling down a node pool.

My questions are:
1. Is there any documented behavior regarding the order of node termination during scale-down in OKE?
2. Does cordoning or draining old nodes affect which nodes OKE decides to remove?

I'm looking for any insights or best practices from those who've handled this in production OKE clusters. Thanks!

2 Answers

Answered By TechieTim On

It sounds like you're on the right track! When you cordon a node, it prevents new pods from being scheduled there, which makes sure that the existing workloads can move to the new nodes. After that, if you scale down and just drain the nodes, OKE should ideally handle the evictions properly, and if everything is set up right, you shouldn't face any downtime.

For the scale-down order, unfortunately, OKE doesn’t have the same documented behavior as EKS regarding which specific nodes get terminated first. In practice, it often seems to respect cordoning, but it might not be guaranteed. Just keep an eye on your workloads during the process!

Answered By ClusterGuru77 On

You're considering a common approach! Just remember, if you don't delete the old nodes after draining, and you scale the pool down from 6 to 3, you might still have those old nodes lingering around. They won't serve workloads since they're drained, but they could still affect your node limits and configurations. So, it's usually a good idea to clean those up after ensuring everything is running smoothly on the new nodes!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.