System Operations

Help! We’re Facing Stale Endpoints After Our EKS Upgrade

March 8, 2026

Asked By TechExplorer123 On March 8, 2026

We recently upgraded our EKS cluster from version 1.32 to 1.33 on March 7, 2026, along with updating the AMIs for all nodes. Post-upgrade, we started experiencing significant service timeouts even though all our pods looked healthy. After some troubleshooting, we found that deleting the Endpoints objects fixed the issue. It seems like stale Endpoints might have been the cause, and we're reaching out to the AWS EKS team for clarification. During the upgrade, the kube-controller-manager briefly restarted, and we updated the node AMIs, which led to full node replacement and new pod assignments. Since the Endpoints weren't updated during the controller's restart, they might still be pointing to the old nodes. We also noticed this issue recurring regularly in production, especially when we delete a CoreDNS pod, causing internal services to timeout again. We need to understand if stale Endpoints were indeed behind this and how we can prevent it in the future.

3 Answers

Answered By SysAdminSamantha On March 9, 2026

Make sure your control plane logs are enabled, as they'll give you better visibility into what's happening during those updates. Definitely submit a support case if you haven't yet, since they might have specific advice or know of any bugs in 1.33 regarding Endpoints.

Answered By DevOpsDude77 On March 9, 2026

It's interesting that deleting a CoreDNS pod causes cascading timeouts! It might be worth investigating any admission webhooks that could be interfering with the recycling of your Endpoints. That happened to us once and took forever to troubleshoot! Hope you get it all sorted out soon!

TechExplorer123 - March 11, 2026

Thanks! I’ll keep that in mind. It’s great to hear from someone who faced similar challenges.

Answered By CloudGuru99 On March 9, 2026

It sounds like you're facing issues with the Kubernetes Endpoints resource, which is essential for service routing. When your kube-controller-manager restarted, it might not have reconciled the new Endpoints correctly, especially with the simultaneous AMI update. I recommend checking the control plane logs for any errors and filing a support case with AWS as they can provide more detailed insights into the upgrade specifics.

KubeMaster42 - March 11, 2026

Yes, that makes sense. If the controller missed updates due to the restart, it definitely could lead to stale Endpoints causing those timeouts. Good luck with AWS support!

Help! We’re Facing Stale Endpoints After Our EKS Upgrade

3 Answers

Related Questions

Can't Load PhpMyadmin On After Server Update

Redirect www to non-www in Apache Conf

How To Check If Your SSL Cert Is SHA 1

Windows TrackPad Gestures

LEAVE A REPLY Cancel reply