I'm looking for a clear understanding of how AWS's Network Load Balancer (NLB) manages scaling and ensuring high availability. I know that the Application Load Balancer (ALB) scales by utilizing multiple IPs behind DNS to distribute traffic effectively, but I'm curious about the NLB. It provides static IPs, so I'm confused about how it scales under heavy traffic without becoming a bottleneck or a single point of failure. Any simple explanations would be greatly appreciated!
2 Answers
The NLB actually distributes traffic across multiple nodes and doesn’t rely on a single IP or node for handling requests. Each Availability Zone (AZ) has its own static IP, allowing DNS to point to multiple IPs, which reduces the risk of a single point of failure. Behind those static IPs, AWS maintains a distributed and scalable system that dynamically adjusts resources. If an AZ fails, its IP is simply removed from DNS responses, and traffic is rerouted to the healthier AZs. So, while it seems like a static IP setup, it’s really part of a robust, auto-scaling network.
The static IP for an NLB is linked to a network interface that manages traffic to back-end resources. This setup means that the NLB can scale rapidly without the delays caused by DNS propagation that you’d see with other load balancers. Essentially, it uses custom technology to efficiently distribute loads internally, making it less prone to bottlenecks, though there are scaling limits.

Thanks! That clears things up for me. It sounds like BGP and ECMP techniques are used internally for load balancing.