I'm working on a project that involves using Whisper to generate subtitles from different media formats. To do bulk translations, I need to create a text file containing absolute paths to my media files. I've been able to generate this list with a find command like this:
find /mnt/media/ -iname *.mkv -o -iname *.m4v -o -iname *.mp4 -o -iname *.avi -o -iname *.mov -o -name *.mpg > media.txt
However, I want to exclude any media files that already have an accompanying .srt file with the same name. For instance, if I have show.mkv and show.srt, I don't want show.mkv to appear in the results. I suspect I might need to pipe the output somewhere else, but I'm not exactly sure how. Any suggestions?
1 Answer
You can test this with a simple shell loop:
```bash
for f in *; do
[[ $f = *@(mkv|mov|avi|mpg|m4v) && ! -f ${f%.*}.srt ]] && echo "$f";
done
```
Just replace `echo` with whatever command you need to run. By the way, I'd love to check out your repo if you're into doing this bulk translation.
Thanks for pointing me to this! I'm currently trying out this repository: [subsai](https://github.com/absadiki/subsai), but there's a known bug related to a missing dependency. If you're testing it out, be prepared to manually install pydub or add it to the dependency list before you build the Docker container. There's also another tool, [subgen](https://github.com/McCloudS/subgen), which seems to automate everything and connects to Bazarr. I'm not set up with Bazarr yet, so I'm exploring more manual options.