I'm dealing with a service that maintains long-lived websocket connections. When I've hit my set capacity limit, I want the Application Load Balancer (ALB) to stop directing traffic to that instance. I've attempted to use separate live and ready endpoints, so the ALB routes traffic using the ready endpoint, but once that endpoint returns a degraded status, it gets drained and rescheduled. Has anyone tackled something like this before?
5 Answers
How about setting up your service so that the ALB's health check fails when it hits capacity? It would pass when you're under capacity. This should work well if you're running on EC2.
One approach is to size your machines properly to handle the capacity. However, if you want to go that route, consider having the instance remove itself from the target group when it reaches capacity, then reinsert itself later when you're ready for more traffic.
Have you thought about using a Lambda function to add or remove a port on the ALB's security group? This would effectively stop any new traffic from getting through.
You might want to use the "Least outstanding requests" routing algorithm for the target group. This method routes traffic to targets with the fewest active requests, which is handy when request load varies a lot. Check it out in the AWS documentation if you're interested!
Just a heads up; the least outstanding requests only helps with fresh connections, but won't affect existing open websockets.
This does require turning off ELB health checks in your Auto Scaling Group, which is generally not recommended.