Hey everyone! I'm a bit new to using EKS and I'm running into an issue with my cluster. I've created a VPC and set up EKS using some Terraform code. My configuration specifies that I want a public endpoint for the cluster and that I'm using an Amazon EKS managed node group. However, my node group is stuck in the 'Creating' state and eventually fails with the error: 'NodeCreationFailure: Instances failed to join the Kubernetes cluster.'
I have two EC2 workers created, but they can't connect to the EKS. Everything's set up on a private subnet, and I've checked the security groups, IAM roles, and policies. I really need some guidance here—does anyone have an idea on how to resolve this?
3 Answers
Thanks for all the tips, everyone! I fixed my Elastic IP limits and reactivated the NAT Gateway, and now everything is working fine!
You might want to check the AMI you are using. Ensure it’s an EKS-specific AMI, as generic ones won't have the necessary scripts to join the cluster. If you launched your EC2s with an SSH key, you can SSH into them and look at the cloud-init logs to get more insight into the failures.
If your nodes are in a private subnet without a NAT Gateway, they won't be able to access the internet, which is crucial for tasks like pulling images or connecting to the cluster API. Make sure to fix your Elastic IP issue and re-enable the NAT Gateway for internet access. If you have VPC endpoints set up for the services you need, you might be fine without the NAT, but generally, I'd recommend enabling it first.
Actually, you can have EKS clusters without internet access. Just set up VPC Endpoints for the necessary AWS services, and that should do the trick.
Bingo, that’s exactly what I was thinking! Also, check fck-nat.dev if you’re looking to save on costs while handling NAT.