Two of the most common requests since 1.0 launched: "Can I use Vext in Spanish?" and "Why does my meeting transcript still mix up speakers when people talk over each other?"
1.2.0 answers both.
The entire interface is now available in five languages. And the meeting diarization engine got a fundamental architectural change — a second offline pass that re-examines your full recording after it ends and re-attributes every speaker label from scratch. The results are noticeably better for fast-moving, overlapping conversations.
Here's what changed.
The interface is now multilingual — and so is the website
Dictation has always worked in whatever language you speak — that's Whisper doing its thing. What wasn't localized was the app itself: the sidebar, settings, onboarding, menus, permission prompts. Everything you read rather than say.
1.2.0 fixes that. The full interface is now available in English, Spanish, Russian, Hindi, and Thai. The app follows your macOS system locale automatically, or you can pin a specific language in Settings → General — switches instantly, no restart.
The website is updated to match. If you've been recommending Vext to teammates who don't work in English, now you can send them somewhere that speaks their language.
More languages are coming. This was a foundation release — the translation infrastructure is now in place, and adding a new language is a matter of translating one file.
A dedicated Speakers tab
Speaker management moved out of meeting transcripts and into its own section in the sidebar.
The Speakers tab shows everyone Vext has learned by voice across all your meetings. You can rename any speaker, pick from 8 badge colors, or — the most useful one — merge two entries into one. If Vext treated the same person as two different speakers over time, you can collapse them: the higher-quality voice profile wins, and every future meeting recognizes the merged identity correctly.
Click any speaker and the right pane filters to only the meetings they appear in. Click a meeting row to jump straight there. For people who record a lot of recurring meetings — standups, client calls, team reviews — this makes it practical to actually manage who's who, rather than re-labeling the same faces every week.
Two-pass diarization: the thing that actually fixes meetings
The original speaker detection worked in a single streaming pass. Each audio chunk got labeled as it arrived, one embedding per chunk. That's fast, but it has a structural weakness: brisk back-and-forth and overlapping speech break it. A 30-second chunk with four speaker turns got one label. Two voices that sound similar early in the call might get merged before the engine has enough data to tell them apart.
1.2.0 adds a second pass that runs after the meeting ends.
Once the provisional transcript is saved, Vext goes back over the full per-stream audio using a more thorough pipeline — pyannote Community-1 for segmentation, WeSpeaker embeddings with overlap-frame masking, and VBx Bayesian refinement. It re-examines every chunk and re-attributes it to the globally best cluster, then writes the corrected labels back into the transcript. If it recognizes a known speaker, it updates their voice profile in the database so future meetings get even better.
You don't do anything. The corrected transcript just appears. The temp audio archives are deleted once refinement is done.
This matters most for exactly the meetings where diarization used to struggle: product reviews with rapid iteration, client calls with three people from their side, any meeting where two people have similar voices or regularly talk over each other.
Splitting speaker turns inside a single chunk
There's a related improvement to the live recording pass itself.
Previously, if a single VAD chunk contained multiple speaker turns, it transcribed as one block under a single speaker label. The offline pass would eventually fix the attribution, but the transcript came out looking wrong while you were still in the meeting.
1.2.0 detects speaker changes inside a chunk as it records. When the timeline shows two distinct speakers in the same audio segment, Vext slices it at the change-point and transcribes each turn separately. Very short flickers under 300ms get absorbed into the adjacent run — you don't want the transcript fragmented on Sortformer noise — but real speaker turns now show up correctly in real time, not just after the offline pass completes.
Reliability improvements
A few things that were silently broken and now aren't.
Hotkeys come back after sleep. The global keyboard tap could go stale after sleep, fast user switching, or certain system timeouts — still reporting as enabled but quietly dropping events. It now reinstalls itself cleanly on wake and monitors the cases where macOS disables it automatically.
Echo cancellation removed. Previous versions applied Apple's VoiceProcessingIO to the microphone input. That API changes shared hardware state and bleeds AGC and noise suppression into every other app reading the same mic — video calls, recording software, anything else running. Meeting recordings capture participant audio through a separate system-audio tap, so the mic and call audio are already physically separate. Echo cancellation was never needed there, and removing it stops Vext from inadvertently making your voice sound worse in other apps while a meeting is recording.
Settings redesign. The settings sidebar is replaced by a segmented picker: General, Hotkeys, Audio & STT, Language & LLM, License, About. Cleaner, and easier to navigate on smaller screens.
Update
brew upgrade muvon/tap/vext
Or download Vext 1.2.0 directly. Existing meetings and speaker profiles carry over — the offline diarization pass will run automatically the next time you open a meeting that was recorded before this update.
If you record meetings with more than two people, open a few older ones after updating. The re-attributed transcripts tend to be a meaningful improvement.