Hey everyone! I'm facing an issue with my EKS setup where my instances are failing to join the Kubernetes cluster. Specifically, I'm getting the error: "Instances failed to join the kubernetes cluster" with a message indicating a NodeCreationFailure. I've shared my Terraform script below for context. Could anyone guide me on what I might be missing or what to check to resolve this issue? Thanks!
5 Answers
It could also be due to using an invalid AMI ID or a general network issue. Consider checking each component carefully to identify the root cause!
If you don't have a VPC endpoint for ECR set up, your nodes might struggle to pull required container images from AWS. It’s crucial for them to have DNS resolution, so check that as well. Just ensure your security group configurations align with the expected setups for both the node group and the cluster.
It sounds like a networking issue. Make sure your routes are set up correctly, as that can often lead to nodes not being able to communicate properly with the cluster.
Check the logs on the failing nodes, especially the cloud-init logs. They can provide insights into what's going wrong with permissions or networking. Also, ensure you have the CNI plugin installed, as missing network components can cause nodes to not transition to the 'ready' state.
Make sure you've tagged the subnets correctly. Each subnet containing nodes should have the tag: `kubernetes.io/cluster/myclustername: shared`. If you’re using a public VPC module, consider using the EKS module as well to simplify setup.
Great point! It's essential to update the tags, and I think the bootstrap script may be an issue too if it's not functioning as expected.
Absolutely! I've had the same issue when my nodes couldn't access the internet, so definitely check your networking setup.