Applications

Best AWS Service for Streaming Voice and Text to AI?

August 23, 2025

Asked By SparkyLemonade32 On August 23, 2025

Hey everyone! I'm trying to figure out the best AWS service to stream a voice recording along with some text to an AI provider. The plan is to stream directly from the user's computer, and I also want a backup option using an HTTP request. Here's what I envision: `User computer >---stream/http--> AWS >---http--> AI provider`.

I heard that AWS might act as a middleman to authenticate the request, process it, and return a response. But there's a catch: I've seen that Lambda functions have a 6MB payload limit, and I'm concerned my first stream or HTTP request might often exceed that. Ideally, I'd be looking for something that can manage requests of at least 10-20MB.

For context, I've already implemented user authentication with Supabase, but using Supabase edge functions isn't an option for this due to latency issues. By the way, I'm also experimenting with AWS using their $200 free trial. I'd really appreciate any guidance you all can provide! Thanks!

3 Answers

Answered By Longjumping-Iron-450 On August 25, 2025

Great question! You might want to consider using a websocket that connects to a Kinesis video stream and then process it with a Lambda function into smaller chunks. Just a heads-up, this approach hasn't been tested yet, so it might not be foolproof!

Another solid option is the Chime SDK which can be integrated directly into your webpage to capture the WebRTC stream and do backend processing.

Lastly, think about an API with an ECS backend to bypass the API Gateway and Lambda restrictions. AWS once did a demo with a Nova Sonic project that used a Dockerized Python backend to handle voice streaming, and this could be one of your best bets!

SparkyLemonade32 - August 25, 2025

I’ll definitely check out the Chime SDK! And I'll keep the EC2 option in mind as a backup if I run into issues with the others. Thanks a bunch! <3

Answered By AudioWhisperer88 On August 24, 2025

Are you planning to send a recorded voice file or is it real-time streaming? These two paths could vary significantly. If you're sending files, the LLM will require a file ID in a compatible format, so it can process that into text as usual.

Just curious, how are you planning to make the LLM calls? Are they initiated from AWS or Supabase functions? I’m interested in how you handle those audio files without much hassle!

SparkyLemonade32 - August 25, 2025

I'm streaming to the cloud where it processes (like speeding up the audio) and then forwards it to Eleven Labs. The LLM calls will indeed be made from AWS.

Answered By LatencyWatcher21 On August 24, 2025

If you're aiming for real-time streaming, I would caution against it—the latency could cause some serious issues. Maybe take a look at Nova Sonic for some insights.

Related Questions

Fix Not Being Able To Add New Categories With Intuitive Category Checklist For Wordpress

Get Real User IP Without Installing Cloudflare Apache Module

How to Get Total Line Count In Visual Studio 2013 Without Addons

Install and Configure PhpMyAdmin on Centos 7

How To Setup PostfixAdmin With Dovecot and Postfix Virtual Mailbox

Dovecot Error Unknown database driver mysql

LEAVE A REPLY Cancel reply