TP-7: A tip for transcribing non-english voice memos

TP-7 is such a perfect memo recorder. But the transcription app is only on iOS and only for English. Here’s an alternative—if you have an Apple Silicon m1/m2/m3 mac and if you’re proficient with the MacOS terminal, you can use OpenAI’s Whisper for amazing transcription in tons of languages:

First, if you don’t have ffmpeg at the ready:
brew install ffmpeg
(install Homebrew first if this doesn’t work)

Get the whisper cpp port with Metal hardware acceleration (apple silicon):
git clone --depth=1 https://github.com/ggerganov/whisper.cpp

Get the large language model with international language support:
bash ./models/download-ggml-model.sh large-v3

Mount TP-7 using FieldKit, then drag a file from Finder into the terminal to get the path where it says :
ffmpeg -i "<filename>" -sample_fmt s16 -ar 16000 "output.wav"

You can now transcribe the file. Using Apple Silicon’s GPU is at least 20 times as fast as using CPU, eating through half an hour of voice memos in just a few minutes. This command assumes Dutch (nl).
./main -l nl -m models/ggml-large-v3.bin output.wav

As a bonus, once you have this working you can batch convert by creating a script. Copy your TP-7 voice memo wavs to a folder called audio-TP7 and created an empty folder audio-16K, both on the same level as the whisper.cpp folder then

find audio-TP7 -name "*.wav" -type f -print0 | xargs -t -0 -I {} sh -c 'ffmpeg -n -i "{}" -sample_fmt s16 -ar 16000 "audio-16K/$(basename {})"'
find audio-16K -name "*.wav" -type f -print0 | xargs -t -0 -I {} sh -c 'whisper.cpp/main -l nl -m whisper.cpp/models/ggml-large-v3.bin -otxt -of "transcripts/$(basename {})" "{}"'

A bit specific but maybe someone is happy with this. YMMV!

7 Likes

Stupid question maybe, but you write about transcribing and the output file is a .wav? Shouldn’t it be a text file?

No such thing as stupid questions :slight_smile:

The output is text indeed. In the example, output.wav is the output of ffmpeg which ensures the audio format is correct, and it is the input file for whisper. The transcribed text is actually printed to the console.

The batch script outputs to proper text files using -otxt (output as txt) and -of (output filename).

Hope this helps!

1 Like

Ha! Thanks, that makes sense. For a second there I thought the people at OpenAI produced a tool that would take an input audio file and recreated a translated version with the same original voice.
We’re probably six months away from that anyway, right?

1 Like

Those tools exist already. We are using it at my company to translate lectures Timothée languages.