How Can I Reduce the High Costs of TTS and STT for My AI Voice Agent?

0
6
Asked By CuriousCoder99 On

Hey everyone! I'm working on an AI Voice Agent using the ESP32 S3 Devkit module, but I'm facing a big hurdle: the costs for Text-to-Speech (TTS) and Speech-to-Text (STT) are really adding up. Currently, I'm using OpenAI Whisper for STT and ElevenLabs for TTS, and I estimate I'll need about 60 minutes of usage each day—about 600 characters per minute. Here's what the breakdown looks like:

- Whisper (STT): ~$0.36/hour
- ElevenLabs (TTS, Creator plan): ~$9.00/hour
- Total: around $9.36 per hour, which translates to about $250 a month for just an hour of use each day. Plus, this doesn't even cover cloud and infrastructure costs.

I'm curious if anyone has tips on how to cut these costs or alternative approaches I should consider!

1 Answer

Answered By TechSavvySam On

First off, what are you trying to achieve? Is this a product for sale, or just for personal use? If it's for a product, what compromises can you make to save on costs? For instance, are you okay with a less realistic TTS voice or lower accuracy in speech recognition?

CuriousCoder99 -

This is for a product to sell. I can cache common phrases and am fine with higher latency, but the TTS needs to sound realistic. I can fallback to a hosted TTS model for minor questions while reserving ElevenLabs for key queries. But using both might lead to different voice results. Any ideas?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.