Help with DNS Resolution Issues in EKS Cluster

0
16
Asked By NinjaNerd123 On

Hey everyone! I'm running into a problem with my newly set up EKS cluster. After installing external DNS through Helm, I'm getting this error on the pods:

`external-dns-7d4fb4b755-42ffn time="2025-10-19T12:02:19Z" level=error msg="Failed to do run once: soft errornrecords retrieval failed: soft errornfailed to list hosted zones: operation error Route 53: ListHostedZones, exceeded maximum number of attempts, 3, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://sts.us-east-1.amazonaws.com/": dial tcp: lookup sts.us-east-1.amazonaws.com: i/o timeout (consecutive soft errors: 1)"`

It looks like there's an issue with resolving the STS endpoint. My cluster is private and located in private subnets but has internet access via a NAT gateway in each Availability Zone. I've also set up an endpoint in the VPC for all private subnets for sts.amazonaws.com, but there are no errors showing up in CoreDNS.

I'm using k8s version 1.33, CoreDNS v1.12.4-eksbuild.1, external DNS version 0.19.0, and the latest Karpenter 1.8.1. Does anyone have any ideas on how to debug this or potential fixes? I would really appreciate any help!

1 Answer

Answered By CuriousCoder88 On

It seems like you're facing issues with external-dns not being able to list Route 53 zones, which is likely due to IAM permissions. To resolve this, ensure you have the correct IAM role set up. External-dns needs permissions to access Route 53, so you'll need to create a role that the service account can assume. Check out the guide on Kubernetes SIGs for setting it up using IRSA (IAM Roles for Service Accounts).

Here’s a quick checklist:
- Create the IAM role with the required permissions.
- Make sure the service account (used by external-dns) can assume this role.
- Use the service account name during the external-dns installation.

Try those steps, and hopefully, you’ll be closer to resolving the issue! Good luck!

UserFriendly9 -

Thanks for the detailed answer! I'm using IRSA too with this trust policy you suggested. It seems the role can't be assumed because of a timeout. Do you think this could be a network issue?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.