Voice to Text for Obsidian on Mac — Dictating Into Your Vault

Obsidian users tend to be people who think by writing. The vault is an extension of how you process the world — meetings, ideas, research, daily notes, project plans. Voice fits this pattern unusually well, because the friction of typing kills the thoughts you'd otherwise capture.

This is a guide to dictating into Obsidian on Mac, the options that work, and the workflow patterns that actually pay off.

Why voice and note-taking pair so well

When you type a thought, you usually shorten it. You compress to the point of the thought, drop the texture, lose the chain of reasoning that got you there. Two weeks later you read the note and have no idea why you wrote it.

When you speak a thought, the texture survives. You say things like "I think the issue is X, but I'm not sure because Y, and the way to test it would be Z." That's the kind of note that's still useful months later. You can't easily type it because typing is too slow to keep up with the reasoning chain. Speaking matches the speed.

For Obsidian specifically — which rewards capturing the messy first version and refining it later through linking and revisiting — voice removes the bottleneck on capture.

What "dictating into Obsidian" can mean

Three different things:

Inline dictation while editing a note. Your cursor is in a note, you hit a hotkey, you talk, words appear at the cursor. Same as dictating into any other text field. This is the most common case.

Voice notes that become Obsidian notes. You record audio outside Obsidian (in a dictation app or voice memo tool), and the transcript gets dropped into a new note in your vault. Better for longer-form capture or meetings.

Mobile capture that syncs. You speak on iPhone, the note ends up in the same vault. Different workflow, usually needs iCloud or Obsidian Sync.

This post is mostly about the first two on Mac. Mobile is a different problem.

Option 1: Apple Dictation

Free, ships with macOS. Click into any Obsidian note, hit your Apple Dictation hotkey (default is press control twice, configurable in System Settings > Keyboard > Dictation), talk, hit it again to stop.

Where it works:

Quick capture in a daily note
Adding a paragraph to a meeting note
Short sentences into bullet lists
Filling in template fields

Where it doesn't:

Technical terms in PARA, PKM, second-brain vocabulary — "Zettelkasten" rarely transcribes correctly. Library names, software names, jargon — bad.
Filler words. "Um", "uh", and false starts go straight into the note. You either edit them out manually or live with notes that read like a transcript.
Long passages. Apple Dictation is built for short bursts. For a 5-minute braindump, you're going to fight it.
Linking. "Open bracket bracket Project X close bracket bracket" is not a fun way to make a wikilink.

For light use — sprinkling voice into typed notes — it works. For voice-first note-taking, it doesn't scale.

Option 2: A local Mac dictation app

This is where the workflow changes from "occasional voice" to "voice is the primary capture method."

Local apps like Vext, Superwhisper, MacWhisper Pro, and VoiceInk run speech recognition on your Mac (Whisper or Parakeet) and paste at your cursor. The relevant differences for Obsidian:

Better technical vocabulary. Whisper Medium and Parakeet both handle "Zettelkasten", "Andy Matuschak", "Obsidian", "Logseq", "PARA", and software names noticeably better than Apple's foundation model.

Cleanup. Vext's Enhance and Superwhisper's mode-based prompts remove filler words and tighten sentence structure before the text reaches Obsidian. Your notes read like written prose, not a transcript.

Long-form dictation. Hold-to-talk works for 30 seconds. Hands-free mode (toggle on, toggle off) works for 5 minutes. A braindump or stream-of-consciousness note isn't a fight.

Privacy. Audio stays on your Mac. For people whose notes contain sensitive thinking — work strategies, personal reflection, draft writing — this matters more than for, say, dictating a Slack message.

Setting up Vext for Obsidian

The workflow is identical to any other text field, but a few settings help:

Install: brew install muvon/tap/vext
Open Settings > Modes
For dictation mode, enable Enhance with the default Gemma 3 4B model — cleanup is what makes spoken notes readable
Disable YOLO Mode for Obsidian — you don't want auto-Enter inside a note; that creates accidental line breaks
Pick a hotkey that doesn't clash with Obsidian shortcuts (default Shift is usually fine; the app distinguishes a short tap from a hold)

Open Obsidian, click into a note, hold the hotkey, talk, release. Cleaned text appears at the cursor.

For longer dictation:

Use hands-free mode (press-once-to-start, press-again-to-stop) for braindumps, daily notes, or capture-everything sessions
Combine with Enhance — the cleanup catches the rambling that hands-free produces

Option 3: Voice notes as standalone Obsidian notes

Some workflows are better served by full audio capture with a transcript landing in a new note. Examples:

Recording a meeting and importing the transcript into a project folder
Talking out a problem for 10 minutes on a walk and getting the result as a note
Capturing a phone conversation (with consent) for later reference

Tools that do this well on Mac:

Vext's notes mode. Hit a hotkey, talk for as long as you want, release. The recording is transcribed, cleaned up via Enhance, and stored in Vext. You can then drag the text into Obsidian or copy and paste it into a new note. Audio is also kept if you want to reference it later.

MacWhisper. Drop an audio file in, get a transcript out. Good for after-the-fact processing of voice memos.

Just Apple's Voice Memos + manual transcription. Free, ugly. Works in a pinch.

For the "transcript becomes a note" flow, the friction is whether the transcript gets dropped automatically into your vault or whether you have to copy-paste. None of the local Mac dictation apps integrate directly with Obsidian's vault yet, so it's a copy-paste step either way. (If you wire up a community plugin or a Hazel rule, you can automate it — but that's its own setup project.)

Patterns that work

A few workflows we've seen people land on:

Daily note with voice paragraphs. Open the daily note in the morning, dictate yesterday's reflection. Dictate a status entry mid-day. The note gets thicker than it would if you typed everything.

Voice-first capture, typed refinement. Speak the messy first draft. Read it back. Edit. The first draft is 60 seconds, the edit is 2 minutes. Total time matches typing, but the captured thought is richer.

Meeting note with voice summary. Type the agenda and action items during the call. After the call, dictate the recap — "What we actually decided was..." — in a single block.

Walking notes. Hands-free mode + AirPods + a phone hotspot lets you dictate into Obsidian while on a walk. You come back with a note instead of a half-remembered idea.

Where this fails

A few honest limits:

Markdown syntax doesn't dictate well. You can train yourself to say "open bracket bracket" for wikilinks, but it's friction. Most people dictate the prose and type the markdown separately. Vext's Enhance can convert "link to Project X" into [[Project X]] if you prompt for it, but the more reliable pattern is: dictate text, type the links.

Code blocks and technical content. Dictating code is a bad idea. Dictating explanations of code is fine.

Multilingual vaults. If you write notes in multiple languages, Apple Dictation will fight you. Whisper-based apps handle this better, including in single-pass mode via translation features.

iCloud and sync timing. If your vault is in iCloud and you dictate on Mac, the note sometimes doesn't sync to iPhone for a few minutes. Not voice-specific, but worth knowing.

Picking one

Decision tree:

Light use, occasional dictation: Apple Dictation. Free, already there.
Voice-first note-taking, paying once for the polish: Vext ($49), Superwhisper ($249), or MacWhisper Pro (€64).
Open-source-only: VoiceInk.
Capturing long-form spoken content as standalone notes: Vext's notes mode, or MacWhisper for after-the-fact transcription of voice memos.

The thing that changes after a week of voice notes isn't speed. It's volume. You capture more thoughts because the friction is lower. The vault gets richer. The Zettelkasten flywheel turns faster because you have more atoms to link.

That's the actual payoff. Speed is incidental.