I've got some voice recordings that I want to transcribe and potentially ask questions about or request summaries. I'm curious why leading AI models like OpenAI's ChatGPT, Anthropic's Claude, and Google's Gemini don't allow audio file inputs yet, especially since they already have multi-modal capabilities.
3 Answers
Actually, Gemini does support audio inputs! You just need to use it through their AI Studio where you can upload any audio file. I've found that the 2.5 Pro version does an amazing job transcribing recordings, way better than I expected!
It’s not just for transcribing—Gemini can analyze songs, identify genres, instruments, and even break down the structure of a track!
For sure, Gemini is on it. Other models might need to catch up soon.
Yeah, Gemini can handle audio and video inputs really well! It shows how advanced they've gotten with multi-modal processing.
Is that feature only available in AI Studio, or can it be used in Gemini Advanced as well?