Apple Dictation vs Vext — When Built-In Stops Being Enough

Apple's built-in Dictation got a quiet upgrade in macOS Tahoe. The new on-device foundation model is fast, accurate on everyday speech, and free with your OS. For a lot of people that's the end of the conversation — they don't need anything else.

For other people it stops being enough within a week. Here's where the line is, and what you do when you cross it.

What Apple Dictation does well

Three things, mostly:

It's already on your Mac. No download, no account, no permission dance. System Settings > Keyboard > Dictation, toggle on, pick a hotkey, done.

It's on-device. Audio doesn't leave your machine for the on-device variant. Apple's privacy story here is real — there's no cloud round-trip and no recording stored anywhere after transcription.

The accuracy on conversational English is good. Better than the old engine. Better than most people remember macOS dictation being. Punctuation inferred from cadence works most of the time. Common tech terms ("React", "TypeScript", "API") come out correctly.

For dashing off a Slack message, dictating a quick note, or doing a sentence-long email reply, Apple Dictation is enough. A lot of users never need to go beyond this.

Where it breaks

Watch what happens when you push it:

Technical vocabulary. Library names, function names, CLI commands, file paths. "kubectl get pods" becomes "cube control get pods". "useEffect" becomes "you sufficed". "src/components/auth/AuthGuard" becomes... something. If your work involves named technical things, you spend more time fixing transcripts than you save dictating them.

Filler word cleanup. Apple Dictation transcribes exactly what you said, including "um", "uh", "so basically", false starts, and run-ons. Spoken language is messier than written. Without a cleanup pass, dictated text reads like a transcript of someone thinking out loud — because that's exactly what it is.

Long passages. Apple Dictation is designed for short bursts. There's no transcript history, no way to capture more than a few sentences cleanly, no notes to come back to. If you want to dictate a 300-word document, you do it in 30-second chunks that you stitch together.

Meetings. Apple Dictation isn't a meeting tool. It only captures from one input source at a time, doesn't have speaker labels, doesn't summarize. If you want meeting transcription, this isn't the right product.

Translation. English in, English out. No multilingual flow.

Hotkey ergonomics. The press-twice-quickly trigger is fine for occasional use, awkward for frequent use. There's no push-to-talk or hold-to-dictate option, no per-app overrides.

A test that tells you which group you're in

Try this for one day: use Apple Dictation for everything you type that's longer than a sentence. Slack, email, notes, code comments, AI prompts.

By end of day you'll either:

a) Notice that it worked surprisingly well — keep using it. b) Notice you keep fighting it on technical terms, or that the lack of cleanup makes your messages sound off, or that you wished you could dictate longer passages.

If (b), you're in the group that needs more than what Apple ships.

What Vext adds, and why

Vext is a $49 one-time Mac dictation app we build. It uses the same on-device principle Apple does — nothing leaves your Mac — but addresses the specific limitations above.

Here's the actual differences:

Speech engine. Vext defaults to NVIDIA Parakeet via CoreML. On M2 it runs at around 150x real-time and handles technical vocabulary better than Apple's foundation model, particularly for code-adjacent terms. You can also pick Whisper Small/Medium/Large for higher accuracy on noisy audio or multilingual content. Apple Dictation uses Apple's foundation model with no choice.

Enhance (LLM cleanup). Vext runs a small local LLM (default Gemma 3 4B, around 2.8 GB) over the transcript before pasting. Filler words go. Sentence structure tightens. The meaning is preserved. The raw transcript is still saved if you want it. Apple Dictation has nothing equivalent.

Hotkey options. Hold-to-talk, hands-free toggle, configurable threshold. Apple Dictation gives you one trigger style.

Meeting mode. Captures microphone + system audio simultaneously, adds speaker labels via local diarization, runs a summary pass through the LLM. Works with Zoom, Meet, FaceTime — anything that produces audio on your Mac.

Translation. Speak any of 99+ languages, get text in your target language. With Enhance enabled, cleanup and translation happen in a single pass.

YOLO Mode. Auto-submit after pasting. Built specifically for AI coding tools.

Screenshot capture during dictation. Drag-select a screen region while talking, image gets pasted alongside the transcript. Useful for prompting AI tools about something visible on screen.

The honest case for staying on Apple Dictation

If your usage looks like this, don't bother with anything else:

Short messages a few times an hour
General English vocabulary
One device, one workflow
You don't mind the press-twice trigger
You're not doing meetings

The on-device foundation model is genuinely good now. Apple shipped a real improvement, and for casual use it's enough.

The honest case for switching

If your usage looks like this, you'll save real time:

Multiple dictations per hour, including longer passages
Technical vocabulary regularly (code, library names, CLI commands)
You want cleanup so your dictated text reads like written text
You take meetings and want transcripts of them
You work in more than one language
You write to AI tools a lot

For that profile, the math on a paid local dictation app works out within a couple of weeks. The friction Apple Dictation creates per use case is small, but it compounds.

Coexistence is fine

This isn't a "switch entirely" recommendation. A lot of people use both: Apple Dictation for one-off quick messages where the press-twice trigger is convenient, Vext (or Superwhisper, or whichever local app) for the longer-form work where cleanup and accuracy matter.

The free version of Vext gives you 100 dictations, 50 notes, and 10 meetings before asking for $49. That's enough to see which side of the line your usage falls on.

What macOS will probably ship next

Apple is heading somewhere specific here. The on-device foundation model in Tahoe is a meaningful upgrade. Future versions will likely bring better cleanup, longer context, and possibly a meeting mode in Notes. The gap between built-in and paid local apps will narrow.

But it'll narrow slowly. Apple isn't going to ship a feature-for-feature Vext or Superwhisper any time soon — they'll add the most common 80% and leave the long tail to third parties. If you're in the long tail (developers, multilingual users, meeting-heavy workflows), the third-party apps stay relevant for the foreseeable future.

For everyone else: Apple Dictation is fine. If you've never tried it on Tahoe, try it. You might not need anything else.