Hey everyone, I'm facing an issue where my EKS instances are failing to join the Kubernetes cluster. Specifically, I'm getting the error "Instances failed to join the kubernetes cluster." I've pasted a snippet of my Terraform code below for your reference. It mentions that the node creation is failing with a state of 'CREATE_FAILED'. Does anyone have suggestions on what I should check or correct to resolve this? Thanks in advance!
4 Answers
First off, double-check your network settings. Make sure your routing is properly configured. Any misconfigurations can prevent the nodes from communicating effectively with the cluster.
You should also take a look at the logs on your nodes, particularly the cloud-init logs. Check for any errors related to networking or permissions. Also, ensure you've got a CNI plugin installed, as missing components can lead to the same error.
Make sure to tag your subnets correctly, specifically with `kubernetes.io/cluster/myclustername: shared`. If you're using a public VPC module, try utilizing the EKS module as well to simplify your setup.
Good call! The tagging part is often overlooked but can cause these types of errors.
Consider adjusting your security group settings. You might need to allow traffic on port 10250 for communication between the cluster API and the node kubelets. Also, ensure that the cluster role has the AmazonEKSClusterPolicy attached for proper operation.
I had a similar issue before. It turned out my nodes couldn't reach the internet, which was crucial for communication.