Help Needed: EKS Pods Failing Startup Checks

0
1
Asked By TechieNinja42 On

I'm working with an AWS EKS cluster that I mirrored from another production account, but I'm hitting some frustrating pod startup failures. Although some of the pods are passing their liveness and readiness checks, others—like ArgoCD and Prometheus—are throwing a "permission denied" error when they attempt to contact their health check endpoints:

```
Readiness probe failed: Get "http://10.2.X.X:8082/healthz": dial tcp 10.2.X.X:8082: connect: permission denied
Liveness probe failed: Get "http://10.2.X.X:8082/healthz": dial tcp 10.2.X.X:8082: connect: permission denied
```

Conversely, apps on ports 3000, 8081, and 9090 seem to work fine. I deployed ArgoCD and Prometheus via their Helm charts without a hitch on other clusters or even locally with Kind.

Additionally, I've run into an error while trying to deploy the Amazon EKS Pod Identity Agent:

```
{"level":"error","msg":"Unable to configure family {0a 666430303a6563323a3a32332f313238}: unable to create route for addr fd00:ec2::xx/xx: permission denied","time":"2025-09-16T15```

The worker nodes run on custom hardened Amazon Linux 2023 AMIs, but my earlier setup with the same cluster was fine. We're currently using EKS version 1.33.

I suspect the issue is related to networking, security groups, or NACLs since I've verified that there are no restrictions on the necessary ports. The cluster was created using terraform-aws-cluster, so the security groups should have the required ports permitted. I was able to curl the failing pod's IP and port just fine when I accessed the worker node directly, so I'm confused about what could be causing this issue.

2 Answers

Answered By CloudySkyWatcher On

It sounds like your instincts might be onto something, but the error you're experiencing—"connect: permission denied"—usually indicates a problem outside of AWS networking. Have you checked if there are any Kubernetes network policies in place? You can run `kubectl get networkpolicies -A -oyaml` to see if there are any that might be affecting access to port 8082.

DevOpsWhiz -

I checked, and there are no network policies set up. Using the AWS VPC CNI add-on too, so that shouldn't be impacting it.

NetSecGuru -

Just a thought: could it be related to IAM permissions for the pods? Maybe something is blocking the connection at that level.

Answered By KubeSleuth On

Have you checked the IAM role configurations for your nodes and pods? Sometimes issues like this can stem from problems with IAM roles, especially if you're using IRSA (IAM Roles for Service Accounts). Making sure the roles have sufficient permissions can really help.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.