Hey everyone, I keep getting this error saying "Instances failed to join the Kubernetes cluster" while trying to set up my EKS node group. The error details show that the node group is in a 'CREATE_FAILED' state. I have shared a snippet of my Terraform code for reference. Can anyone help me troubleshoot what might be going wrong?
5 Answers
Another thing to consider is your security group settings. It looks like the current configuration might be blocking communication between the cluster API and the kubelets on your nodes. Make sure you allow port 10250 and verify that the AmazonEKSClusterPolicy is attached to your cluster role.
Network issues or using invalid AMI IDs could also be the cause of the failure. Plus, if you're not using an ECR VPC endpoint, your node might struggle to pull necessary container images.
You should also check the logs on your nodes, particularly the cloud-init logs. They might reveal networking or permissions issues. Don't forget to see if you have a CNI plugin installed; that could lead to the same error if the nodes can't get ready due to missing network components.
First off, make sure your networking is set correctly. Check if your routes allow communication between the nodes and the necessary services. Sometimes, issues like this stem from bad network configurations.
Did you tag the subnets properly? Any subnets with nodes need to have the tag `kubernetes.io/cluster/myclustername: shared`. If you're going with a public VPC module, think about using the EKS module as well; it'll simplify things quite a bit.
Right, I’ve had a similar problem where my nodes weren't able to connect to the internet, which caused this error.