Is it worth switching from a simple HTTP proxy to gRPC for internal LLM service?

0
10
Asked By TechGuru99 On

Hey everyone! I need some advice on whether I should switch from my current setup for accessing an LLM service that runs on a VM within our company network. Right now, I've got a straightforward configuration where the LLM service runs an HTTP server on the VM, and I'm using a lightweight nginx pod in a Kubernetes cluster as a proxy. This setup allows users to hit an endpoint, which nginx forwards to the VM, and it works well for us.

However, I've been given a suggestion to explore switching to gRPC for communication between the gateway and the backend service, potentially using gRPC-Gateway so that the Kubernetes gateway communicates with the VM via gRPC, while users still access the service through HTTP/JSON (which would be automatically converted by the gateway).

I've started researching Protocol Buffers, buf, and gRPC, but I'm new to it. My existing HTTP API is pretty basic (mostly follows the "/v1/completions" style). Here are some questions I have:
- What benefits would this gRPC setup provide for my situation?
- Is the added complexity with things like .proto definitions, code generation, and buf justified?
- Will there be noticeable improvements in performance, observability, or maintainability?
- Are there any potential pitfalls or operational challenges I should anticipate?

I'd really appreciate any insights, especially from those who've implemented gRPC in similar internal service configurations. Thanks a lot!

5 Answers

Answered By KuberNerd76 On

Also, remember that in a Kubernetes environment, load balancing is often TCP connection-based, which might influence how you decide to architect your solution. Just something to think about!

Answered By CodeSlinger88 On

There’s a tool called liteLLM you might want to check out. It could simplify your life, and you can learn a lot by emulating what they do!

Answered By JsonMaster On

I thought gRPC was touted for efficiency, but it might not be the right fit for low-volume requests like JSON from a browser to an endpoint. The translation layer might slow things down instead of speeding them up!

Answered By CloudyCoder1 On

It sounds like you might be overcomplicating things! If your LLM service is simply sitting behind an nginx proxy without running in Kubernetes itself, moving to a full gRPC setup may add unnecessary complexity. gRPC can be beneficial if you're building services based around protobuf APIs, but if your current HTTP API is working fine, it might be best to stick with that for now.

Using Kubernetes just to run a reverse proxy seems excessive. You could run nginx directly on the VM instead, and keep things simple. GRPC is great, but maybe not for passing JSON back and forth in your scenario.

NinjaDev42 -

Exactly! If your primary goal is just proxying to the VM, then traditional methods might serve you better. Opting for the simpler route keeps your stack cleaner.

Answered By DevWizard23 On

You could definitely consider the Gateway API as an alternative! It might streamline things for you without delving into the complexities of gRPC and protobufs.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.