I've come across a puzzling difference in how legacy and newly created Network Load Balancers (NLBs) handle DNS for Availability Zones (AZs) that have no targets registered. Here's the setup: I have an internet-facing NLB linked to a target group consisting of K8s nodes, and I've disabled cross-zone load balancing for cost and latency management. In a scenario with three AZs, one AZ has no healthy nodes. The issue is, with older NLBs, the DNS record correctly omits the empty AZ's IP address, while new NLBs keep publishing the IP, causing traffic to route to the inactive AZ and resulting in connection failures. AWS Support has claimed this is expected behavior, but it feels like a breaking change that disrupts standard disaster recovery and failover techniques. I'm curious if others have experienced this and whether there's a way to control this behavior with attributes. Is forcing cross-zone load balancing or manually adjusting subnet mappings the only workaround?
6 Answers
Make sure you’re conducting `dig` tests against the right DNS entries. Sometimes client-side caching or provider settings can throw things off. If everything is set up correctly, and you're still seeing inbound traffic to that inactive AZ, there's likely something flawed in the NLB's handling of DNS during failovers.
Good call on the testing—those mismatches can cause a lot of confusion!
Check your target group attributes via `aws elbv2 describe-target-group-attributes`. It's possible AWS changed some defaults recently that you may not be accounting for in your Terraform setup.
I checked, and it seems the minimum healthy targets is the same on both the new and old target groups.
Sounds like you’re on top of it. Keep digging into those attributes!
Definitely assert your findings to AWS support. If their docs claim DNS failover is supposed to happen, and it isn't, it's either a bug or a change they haven't documented. Are you sure all targets are healthy? That might be part of the problem.
I can confirm all targets are healthy for the NLB; there might be something else going on.
You might be onto something—there could be hidden settings that need adjusting.
This situation is odd indeed, and I commend your thorough documentation! I haven't replicated the problem myself, but AWS sometimes introduces mysterious changes. It'll be interesting to see if someone from AWS can explain why this is happening now.
I've seen this behavior as well. It seems like a regression of sorts during updates. Let’s hope they address it soon!
I’m in the same boat; I've been looking for answers too, but no luck.
It's frustrating when support tells you nothing has changed, only to find out later that there were indeed changes. I think you're right to point this out; it doesn't make sense that the new NLBs behave this way when the old ones clearly don't. Keep pushing for clarity!
Yeah, it's crazy how often we hear that from support, but it can be worth it if you keep escalating.
Definitely frustrating! It seems like they just want to close tickets without addressing the actual issues.
It's worth noting that Terraform's behavior might have changed due to an API update. Some users have reported issues with stale IPs remaining in target groups. It's a conundrum for sure—definitely follow up on the differences between the two NLBs you've got.

We use AWS's DNS resolver, so I don’t think it’s a client-side issue, but we’ve double-checked everything.