Hey everyone! I'm having a bit of a hiccup while trying to set up my EKS cluster with a custom ENIConfig. Here's the situation: I have a few subnets in my VPC, which are all configured, but I'm struggling with the proper setup for custom networking. Specifically, I have one CIDR block for pod networking: 100.64.0.0/16, which limits me to deploying worker nodes in the ca-central-1a availability zone only. I followed the steps to enable custom networking and created the ENIConfig resource, but now my new node group just gets stuck in the creating state and ultimately fails, indicating that the node cannot join the cluster. I suspect it might have something to do with security group settings or the timing of when custom networking was enabled, but I can't seem to figure it out. Has anyone dealt with this issue before and found a solution?
3 Answers
Did you enable custom networking before or after creating the nodes? If it was after, you might need to do a full node rollout as the docs suggest that’s necessary. Also, check the kubelet logs on the nodes for more detailed error messages; those can offer insight into what’s failing.
Just a heads up, if you're using a /27 subnet, you might be limited to around 28 pods. Since you're planning to use 100.64.0.0/16 for pod networking, make sure you’re optimizing your pod distribution. However, if you're running into issues regardless of the ENIConfig changes, it could be unrelated to CIDR block restrictions.
Right, I planned to utilize 100.64.0.0/16 for the pod networking. It's strange that the node group can’t even get up without the ENIConfig changes. I'm really stuck!
First things first, have you checked that your security groups are properly configured for the nodes? It's vital that your cluster can communicate with them, which is fundamental to everything working smoothly. If the security groups are too restrictive, it could cause communication issues.
I think they're set up right. Before I did anything fancy with the ENIConfig, the node group could join the cluster. But after making the changes and creating a new node group, the old nodes went 'NotReady' and the new one fails. The current SG attached to the control plane should allow all traffic, so I'm puzzled.
I created the worker node group after setting up the custom ENI, so I thought I was in the clear. I’ll definitely look into the kubelet logs—thanks for the tip!