I think a lot of people have heard of OpenAI’s local-friendly Whisper model, but I don’t see enough self-hosters talking about WhisperX, so I’ll hop on the soapbox:
Whisper is extremely good when you have lots of audio with one person talking, but fails hard in a conversational setting with people talking over each other. It’s also hard to sync up transcripts with the original audio.
Enter WhisperX: WhisperX is an improved whisper implementation that automatically tags who is talking, and tags each line of speech with a timestamp.
I’ve found it great for DMing TTRPGs — simply record your session with a conference mic, run a transcript with WhisperX, and pass the output to a long-context LLM for easy session summaries. It’s a great way to avoid slowing down the game by taking notes on minor events and NPCs.
I’ve also used it in a hacky script pipeline to bulk download podcast episodes with yt-dlp, create searchable transcripts, and scrub ads by having an LLM sniff out timestamps to cut with ffmpeg.
Privacy-friendly, modest hardware requirements, and good at what it does. WhisperX, apply directly to the forehead.
half sarcastic but the overall premise of rigging something in to a local voice assistant, when an arguement starts “Ok nabu record this conversation”. then 2 weeks later on another arguement… “OK nabu search our last arguement for the cabinet”. Would be like having a court transcriber on call.
I have a lady friend that does quite a good enough job of that. LOL
‘You remember back in 1979…it was a Friday at 2:11 PM, and you said…’ ‘Babe, I don’t remember what I had for breakfast yesterday.’
Does she do it for her fuckups, though?
What kind of stupid-ass question is that? LOL All kidding aside, she’s a good soul. We’re not married, we’ve just know each other for 45+ years. It just kind of clicked. She lives in her house, and I in mine, and we get together as often as possible.
Hmm… Would be interesting to find out what kind of effect that has on the average marriage or relationship 😅
“You love the robot more than me!” 💔️
I mean, I’d imagine probably not a good one :) Somehow I imagine asking the AI to record a conversation, is an instant arguement escalator… as is asking to read the facts back, and usually the topic would be switched rather than one side admitting their fault in the conversation.
Actually I think there’s a black mirror episode on roughly that (not a device for recording audio when asked, but everyone having a chip in their head that automatically records their memories, and a huge fight when a husband discovers his wife deleted a few hours of recordings.