I'm trying to find a way to translate audio from my desktop in real-time. I've searched a lot but mostly come across solutions that require speaking into a microphone. I want something that runs on my computer, listens to the audio output (like if I'm on a call), and then displays a translated version of what I hear. For instance, if my friend speaks German, I'd love to see the English translation on my screen. Any ideas or tools that could help with this?
1 Answer
You might want to check out Whisper along with an OpenAI API like 4o-mini or 4.1-nano. Whisper can process the audio in chunks, which lets it output transcriptions almost in real-time. There’s some useful info in a developer community thread about setting this up.
Is this method reliable? I found a GitHub repository that claims to do this: [Speech-Translate](https://github.com/Dadangdut33/Speech-Translate/releases/tag/1.1.0). What do you think?
Is there a beginner-friendly guide available? I’m not familiar with how all this works!