I'm currently working on extracting audio file metadata, such as mel spectrograms and fundamental frequencies, from around 5,000 audio files. Unfortunately, using Librosa in Python has proven to be quite slow. For instance, processing just 10 two-second files takes about 3 seconds. I'm looking for libraries or even other programming languages that can perform this extraction more efficiently. Here's a snippet of my current code for reference:
```python
def mel_spectrogram(audio: np.ndarray, sr: int | float) -> np.ndarray:
S = librosa.feature.melspectrogram(y=audio, sr=sr, power=2)
S_db = librosa.power_to_db(S, ref=np.max)
return S_db[0]
```
I also have similar functions for spectral rolloff, fundamental frequency extraction using pyin, and MFCC. Any suggestions for faster alternatives would be greatly appreciated!
1 Answer
Performance in Python can be a bit tricky, especially for audio processing tasks. Librosa already has parts written in C/C++, so you might not find much faster in Python. C++ is really the go-to for heavy audio work. Lots of libraries out there might do the job better, but if you're new to C++, it might be a steep learning curve. Also, keep in mind that audio processing can be inherently computationally intensive, so don’t expect to see speeds faster than real-time by a huge margin, even with those libraries.
Thanks for clarifying that! I’m not quite ready to dive into C++, so I might focus on optimizing what I have in Python for now.