Hey everyone! I'm running into some trouble deploying my NestJS gRPC server on AWS ECS. I've set up a Network Load Balancer (NLB) to manage traffic to my service using a target group, but it doesn't seem to respond correctly to the defined services. For instance, the health check is returning this error: `Error: 2 UNKNOWN: Server method handler threw error stream.call.once is not a function`, even though the same request works perfectly fine on my local setup with a response of { status: 'SERVING' }. I suspect this error indicates that the request is reaching the service but something is going wrong afterward. Why might my handler work locally but fail with this error when deployed behind the NLB? Also, here's my health.proto code for context. By the way, I have not implemented a health check endpoint for this target group and am currently using TCP health checks. I did try a health check path for an Application Load Balancer (ALB), but that didn't work either: /grpc.health.v1.Health/Check.
2 Answers
It sounds like you're facing a common issue with gRPC and NLBs. gRPC requires HTTP/2, but NLB doesn't handle that automatically. You should check if your NLB is properly configured to use HTTP/2. Also, your current TCP health check will just verify if the port is open, without checking the gRPC method itself. I’d recommend two things: ensure your gRPC client is sending requests over HTTP/2, and confirm that your NestJS server is set up to accept gRPC traffic on that raw TCP port. You can find more details in the AWS documentation on target group health checks.
Thanks for the insight! I ended up switching to an ALB because of NLB limitations, but I still face the same error. Could port mapping be an issue? Even with the target group health checks being successful, I'm getting the same error when deployed through ECS.
This is a classic issue with combining NLB and gRPC. Your code seems fine, but NLB often sends malformed HTTP requests to gRPC health check endpoints. You might want to change the health check from HTTP to TCP and remove the health check path; just let it check if the port is open. If you really need proper gRPC health checks, I suggest using an ALB or creating a separate HTTP health check endpoint. But honestly, TCP checks are usually sufficient.
Got it, thanks for clarifying that! Just so you know, I'm not using a separate health check endpoint with NLB; it's set to TCP. I had issues with ALB too, where health checks kept failing for different reasons.
Yeah, TCP checks would be fine in most setups. Just make sure you’re binding your service to `0.0.0.0` in your container instead of `localhost`, as some users overlook that.
Port mapping could definitely be part of the problem. Make sure that your target group is set to the correct gRPC protocol version and that you're mapping it properly in your ECS task definition. Sometimes, misconfigurations in port mappings can lead to these kinds of issues.