I recently watched 'Charlie's Angels' and there's a cool device that can listen to audio and tell you who is speaking. I'm curious, how realistic is this in terms of current technology? What would it take theoretically to create something like this?
1 Answer
Absolutely, it's quite feasible! Just think about how humans can recognize voices. We might recognize many voices after just hearing a few words. Technology can mimic that. For instance, devices like the Amazon Echo can identify different speakers to personalize responses. Though, for such software to work, it needs a database of recorded voices and who they belong to. The accuracy is the big question; it's about how many voices it can differentiate before it struggles.
And I'd add, the more audio you feed it, the better it gets! I saw a guy predicting hometown accents just from a couple of sentences. It's fascinating how much inflection can reveal about someone!