Hi everyone! I'm relatively new to using EKS and I'm encountering a problem with my cluster setup. I've created a VPC and am using Terraform to deploy EKS with the following configuration:
```hcl
module "eks" {
# source = "terraform-aws-modules/eks/aws"
# version = "20.37.1"
source = "git::https://github.com/terraform-aws-modules/terraform-aws-eks?ref=4c0a8fc4fd534fc039ca075b5bedd56c672d4c5f"
cluster_name = var.cluster_name
cluster_version = "1.33"
cluster_endpoint_public_access = true
enable_cluster_creator_admin_permissions = true
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
eks_managed_node_group_defaults = {
ami_type = "AL2023_x86_64_STANDARD"
}
eks_managed_node_groups = {
one = {
name = "node-group-1"
instance_types = ["t3.large"]
ami_type = "AL2023_x86_64_STANDARD"
min_size = 2
max_size = 3
desired_size = 2
iam_role_additional_policies = {
AmazonEBSCSIDriverPolicy = "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
}
}
}
tags = {
Terraform = "true"
Environment = var.env
Name = "eks-${var.cluster_name}"
Type = "EKS"
}
}
```
My VPC is set up, and so is my EKS cluster. However, the node group is stuck in the 'Creating' status and eventually fails with the following error:
```
Error: waiting for EKS Node Group (tgs-horsprod:node-group-1-20250709193647100100000002) create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: i-0a1712f6ae998a30f, i-0fe4c2c2b384b448d: NodeCreationFailure: Instances failed to join the kubernetes cluster
```
I have two EC2 workers created, but they cannot join my EKS. Everything is set to private subnets. I checked the Security Groups, IAM roles, and policies, but I'm still stuck. Does anyone have any ideas or suggestions on how to resolve this? Thanks in advance!
3 Answers
I'd recommend checking the AMI you're using. It looks like it might not have the necessary EKS scripts. AWS provides specific AMIs for EKS. If you can SSH into your EC2 instances, look into the cloud-init logs to find the errors preventing them from joining the cluster. Good luck!
Thanks for the help, everyone! I managed to fix the Elastic IP issue and re-enabled the NAT Gateway. Now everything is working perfectly.
If your nodes are in a private subnet without a NAT Gateway, they'd struggle to reach the internet for tasks like pulling images or connecting to the cluster API. Make sure you address that Elastic IP issue and re-enable your NAT Gateway!
While you're at it, check out fck-nat.dev to save on those costs.
Actually, we've run EKS clusters without direct internet access. VPC Endpoints for certain AWS services can effectively make your setup work locally.
Bingo!