How to Filter Out Low Confidence Responses from the Whisper API?

0
12
Asked By EchoWanderer77 On

I've been playing around with the OpenAI Whisper API for speech-to-text conversions, but I've run into a problem. When I send empty audio, the API often returns random words, and I've noticed a lot of it seems to be in Chinese. Is there any way to get a confidence score or some sort of metric to help me filter out these low-confidence responses?

1 Answer

Answered By RhythmicCoder99 On

You won't be able to get a confidence score since the transcription models don't work that way. They're designed for actual speech, not blank audio. What you really need to do is segment your audio beforehand. Only send portions that contain speech without long pauses. I actually built an app this weekend that summarizes police scanner calls by chunking audio from mp3s. Without segmenting, I ended up with nonsense transcripts like "Thank you Thank you Thank you..." repeating endlessly.

CuriousListener42 -

Thanks, that's super helpful! Quick question: What's the ideal length for those audio chunks? Can it be anything as long as there are no big gaps? Also, I’m using Expo Audio to meter in real-time but I’m struggling with timing. My plan is to buffer and save audio when there's a pause, but if I wait for sound to start recording, I might miss the start. Have you run into voice detection issues like this before?

Related Questions

Online Audio Cleanup Tool

Extract Audio From Video File

Compress MP3 File

Online Audio Converter

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.