Voice to Text for Cursor AI on Mac — Speaking Prompts to the Composer

Cursor's composer is where most of the AI heavy lifting happens — multi-file edits, refactors, scaffolding a new feature. It's also where typing falls apart fastest. The prompts that produce good edits are long: context, constraints, what to touch, what not to touch, why. Typing 200 words while you're already mid-task is friction.

This is a guide to using voice for Cursor specifically — composer, inline chat, and ask mode — on Mac.

Why voice fits Cursor better than other AI tools

Cursor uses your prompt to decide which files to load into context. The longer and more specific the prompt, the better its file selection. A two-line typed prompt gets a two-file context window. A spoken paragraph with file paths, behaviors, and constraints gets the right ten files.

The other reason: composer prompts are batch operations. You're describing a unit of work, not having a conversation. Batches favor front-loaded context. Voice naturally front-loads — you start with the situation, get into the request, end with constraints. That's the shape Cursor wants.

What a good Cursor prompt looks like spoken

Typed (43 words):

Refactor the AuthGuard to use the new permission system. It's currently checking roles directly which won't scale.

Spoken (180 words):

The AuthGuard component in src/components/auth/AuthGuard.tsx is currently checking user roles directly through user.role equals admin or user.role equals editor. We just shipped a new permission system in src/lib/permissions.ts that exposes hasPermission and useHasPermission. I want to refactor AuthGuard so it accepts a required permission as a prop instead of checking roles. Look at how the new permission system is used in src/components/admin/UserList.tsx for the pattern. Keep the existing API surface backward compatible by allowing either a roles prop or a permission prop — if both are passed, prefer permission. Update the three callsites in src/pages that use AuthGuard with roles to use permission instead. Don't touch the legacy admin panel under src/pages/admin-legacy — that's getting deleted next sprint and isn't worth migrating.

Cursor handles the second one in one shot. The first one gets you a diff that does the wrong thing on three files and skips the callsites.

The difference isn't intelligence on your end — it's whether the friction of typing forced you to compress.

Setting up voice for Cursor on Mac

Cursor is an Electron app, so anything that pastes text at the cursor works. Three options:

Apple's built-in Dictation

Free, on-device on macOS Tahoe. Enable in System Settings > Keyboard > Dictation. Hit your hotkey, talk, hit it again. Works in any Cursor panel.

Where it struggles: technical vocabulary. File paths, variable names, library names, CLI commands — Apple's dictation mangles all of these. "src slash components slash auth guard dot tsx" turns into something unusable. Fine for natural language, painful for code-heavy prompts.

Wispr Flow, Superwhisper, or other cloud/local dictation apps

These run the speech recognition with better technical vocabulary handling. They paste at cursor like Apple Dictation but with cleanup, longer accuracy, and (depending on the app) cloud or local processing.

Vext with YOLO Mode

Vext is the option we make. It's $49 once, runs Parakeet locally on Apple Silicon, and has a feature specifically built for AI tools: YOLO Mode auto-submits the prompt after pasting. You talk, release the hotkey, and the composer is already running.

The Vext setup for Cursor:

Install: brew install muvon/tap/vext
Grant Accessibility permission when prompted
Open Settings > Modes, enable Enhance for dictation (filler word cleanup)
Enable YOLO Mode if you want auto-submit
Open Cursor, click into the composer panel, hold your hotkey, talk

The Enhance step matters more than you'd think. Raw transcription gives the composer messy input that costs tokens and confuses file selection. Cleaned-up input ("the issue is..." instead of "so the issue is like uh...") gets the same intent across with less noise.

Where to use voice in Cursor

Composer prompts — biggest payoff. Multi-file edits where you need to describe the situation, the change, and the constraints.

Inline chat (Cmd+K) — works well for medium prompts. "Convert this function to use async/await, keep the error handling shape" reads naturally spoken.

Ask mode — good for exploration questions. "Why are we re-rendering this component every time the user types? Trace through the props and look at any context providers it depends on."

Tab autocomplete — don't bother. The flow is too fast for voice to help.

Chat panel for follow-ups — voice for the substantive replies ("look at the implementation in fooService.ts and explain why we're catching the validation error there"), type for short ones ("yes", "try again", "different approach").

Workflows that change once voice is set up

The cold-start prompt

The first message in a Cursor session is the highest-leverage one. Cursor uses it to seed context for the entire conversation. With typing, you compress. With voice, you front-load:

I'm building out the billing settings page. We're using TanStack Query for data fetching, Zustand for client state, and the design system in src/ui. The Stripe customer portal flow needs to be embedded — there's a stripeService.ts that has createPortalSession but nothing wired up to the frontend. I want a billing page at app/settings/billing that shows the current plan, has a button to open the Stripe portal in a new tab, and shows the next invoice date and amount. Use the existing card components from the design system. Don't add new dependencies.

That gives Cursor enough to scaffold the whole page in one pass.

Bug reports to the AI

Bugs are narrative — they happened in sequence. Speaking the timeline is faster and more accurate than typing it:

The autosave feature in the document editor is sometimes saving stale content. Reproducing it is hard but I think I've got it. When the user types fast and the network request is slow, the optimistic update sets the local state to the new content, but if a previous save's response comes back after the new one, it overwrites local state with the old content. The race is somewhere in the useAutoSave hook in src/hooks/useAutoSave.ts. Look at the request ordering and fix it. Use an incrementing sequence number so out-of-order responses get dropped.

Code review on a teammate's PR

Open Cursor's diff view, dictate your comments through the composer with "leave a comment that says..." — much faster than typing review feedback. Particularly good when you want to articulate reasoning, not just point at lines.

Common questions

Does Cursor have voice built in?

Not really. There's no native dictation in Cursor itself — you're relying on macOS or a third-party app. Cursor is just an Electron text surface as far as voice tools are concerned, which is actually convenient because anything that types into a Mac app types into Cursor.

Won't the AI get confused by spoken-style phrasing?

GPT-4o and Claude Sonnet handle conversational speech without issue. Filler words and restarts have been training data forever. The risk isn't comprehension — it's wasted tokens. That's what cleanup tools like Vext's Enhance solve.

Should I dictate code?

No. Dictate the natural-language parts — context, intent, constraints. When you need to include actual code in a prompt, paste it. Voice is for the part that takes longer to type than to think.

The honest tradeoff

System-wide dictation costs something — money, a model download, accessibility permission, the awkward first week of talking to your computer. Wispr Flow is $15/month and works cross-platform. Vext is $49 once and is local-only on Apple Silicon. Apple Dictation is free but limited.

If you live in Cursor — meaning you write more prompts to it than emails — a paid dictation tool pays for itself in a couple of weeks of saved typing. If you only use Cursor occasionally, Apple Dictation is enough.

Either way, the unlock is the same: prompts get longer because talking is easier than typing, and Cursor responds better to long prompts. The cleanup is gravy.