Issue with Outbound WebSocket Messages Not Delivering Until Connection Closes

0
1
Asked By TechWizard42 On

I'm having a peculiar issue with my WebSocket API that's deployed in an EKS cluster using a Network Load Balancer (NLB). Here's the situation: I built my API using ASP.NET Core and deployed it as a Docker image on a single pod within the cluster. While I'm able to establish a WebSocket connection successfully and can see from the logs that messages are being sent, the responses from the server aren't reaching the client until the client closes the connection.

I conduct tests that connect to the server and expect multiple messages in response. If they don't receive any within 30 seconds, the connection times out and closes. What's puzzling is that when the connection does close, all the messages sent by the server are delivered at once, along with the close acknowledgment. I've bypassed the load balancer using a port-forward command and everything works fine, which leads me to believe the problem lies with the NLB. Has anyone else encountered this issue?

Also, here's my LoadBalancer service configuration for reference: apiVersion: v1, kind: Service, name: my-app, annotations: service.beta.kubernetes.io/aws-load-balancer-type: "external", service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip", type: LoadBalancer, selector: app: my-app, ports: - protocol: TCP, port: 80, targetPort: 8080.

2 Answers

Answered By DevOpsDynamo On

The NLB works at layer 4, which might not be the best for long-lived WebSocket connections. If everything else (like Security Groups and NACLs) is good, consider increasing the NLB timeout or switching to an Application Load Balancer (ALB), as it better supports WebSockets.

TechWizard42 -

Thanks! I'm thinking of trying out the ALB ingress. Do I need to configure anything specific to handle outbound (egress) traffic?

Answered By CodeGuru99 On

It sounds like the NLB is buffering the outbound WebSocket frames until it sees a full response, which can interrupt WebSocket streaming. Have you configured your target group to use TCP instead of HTTP? Also, check the idle timeout setting on your NLB because that might be impacting this. By default, NLBs can behave this way with WebSockets.

TechWizard42 -

I've set protocol: TCP in my service YAML. How can I check the idle timeout for the NLB?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.