I've been playing around with OpenAI's Whisper https://openai.com/blog/whisper/ thanks to the Google Colab notebook made available by DotCSV in this YouTube video: https://www.youtube.com/watch?v=JuMEmF-2FsA&list=PL0z-YLwhf5znzg3ZzzugEsxlmVKWC5bf-
I did a transcription of a video interview in Spanish and the result was, let's say, good enough. It does have a lot of errors and it needs a lot of cleaning, but having such a tool is very very promising.
Check it out here: https://umerez.eu/categories/blog-legal/