Best AWS Service for Streaming Voice and Text to AI?

0
4
Asked By SparkyLemonade32 On

Hey everyone! I'm trying to figure out the best AWS service to stream a voice recording along with some text to an AI provider. The plan is to stream directly from the user's computer, and I also want a backup option using an HTTP request. Here's what I envision: `User computer >---stream/http--> AWS >---http--> AI provider`.

I heard that AWS might act as a middleman to authenticate the request, process it, and return a response. But there's a catch: I've seen that Lambda functions have a 6MB payload limit, and I'm concerned my first stream or HTTP request might often exceed that. Ideally, I'd be looking for something that can manage requests of at least 10-20MB.

For context, I've already implemented user authentication with Supabase, but using Supabase edge functions isn't an option for this due to latency issues. By the way, I'm also experimenting with AWS using their $200 free trial. I'd really appreciate any guidance you all can provide! Thanks!

3 Answers

Answered By Longjumping-Iron-450 On

Great question! You might want to consider using a websocket that connects to a Kinesis video stream and then process it with a Lambda function into smaller chunks. Just a heads-up, this approach hasn't been tested yet, so it might not be foolproof!

Another solid option is the Chime SDK which can be integrated directly into your webpage to capture the WebRTC stream and do backend processing.

Lastly, think about an API with an ECS backend to bypass the API Gateway and Lambda restrictions. AWS once did a demo with a Nova Sonic project that used a Dockerized Python backend to handle voice streaming, and this could be one of your best bets!

SparkyLemonade32 -

I’ll definitely check out the Chime SDK! And I'll keep the EC2 option in mind as a backup if I run into issues with the others. Thanks a bunch! <3

Answered By AudioWhisperer88 On

Are you planning to send a recorded voice file or is it real-time streaming? These two paths could vary significantly. If you're sending files, the LLM will require a file ID in a compatible format, so it can process that into text as usual.

Just curious, how are you planning to make the LLM calls? Are they initiated from AWS or Supabase functions? I’m interested in how you handle those audio files without much hassle!

SparkyLemonade32 -

I'm streaming to the cloud where it processes (like speeding up the audio) and then forwards it to Eleven Labs. The LLM calls will indeed be made from AWS.

Answered By LatencyWatcher21 On

If you're aiming for real-time streaming, I would caution against it—the latency could cause some serious issues. Maybe take a look at Nova Sonic for some insights.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.