Voice Dictation for Claude Code, Cursor, and AI Coding Tools

Voice input fits naturally with AI coding tools because they're conversational — you describe what you want, the AI responds, you iterate. The bottleneck isn't the AI. It's how fast you can talk to it.

Why voice works better for AI prompts

When you type a prompt, you compress. A task that needs 80 words of context gets compressed to 12 because typing is slow and you instinctively skip the "obvious" parts. The AI then guesses wrong and you spend three follow-ups correcting it.

When you speak, compression disappears. You naturally include background, constraints, and reasoning. The AI gets what it needs on the first try.

Typed prompt:

"Refactor the auth middleware"

Spoken prompt:

"The auth middleware in middleware/auth.ts is doing too many things — it validates the JWT, checks permissions, loads the user object, and sets rate limit headers. I want to split it into separate middleware functions so we can compose them per route. Keep the JWT validation as the base, and make the others optional."

Same developer, same intent. The spoken version gives the AI enough to work correctly without follow-up questions.

Setting up Vext for coding

1. Install Vext

brew install muvon/tap/vext

Launch the app and grant it Accessibility permission when prompted. This allows the hotkey system to work globally.

2. Configure your hotkey

The default is Shift — hold it for half a second to start dictation. This works well because short Shift presses (for capitalization) are ignored.

You can change the hotkey and hold threshold in Settings > Hotkeys.

3. Enable YOLO Mode

YOLO Mode is what makes this work with AI coding tools. When enabled, Vext automatically presses Return after pasting your transcription. Your prompt goes straight to the AI — no manual submission needed.

This is safe with Claude Code and terminal-based agents because you can always interrupt. The time saved not reviewing every prompt beats the occasional rephrase.

4. Try Enhance

Enable Enhance to clean up filler words and fix sentence structure before pasting. AI tools handle messy speech fine, but clean prompts produce marginally better results and are easier to re-read in your conversation history.

Workflows that benefit most from voice

The initial context dump

The first message to an AI coding tool is the most important. It sets up the entire conversation. Voice excels here because you naturally front-load context:

"I'm working on the checkout flow. We have a React frontend with a Node backend. The cart state is managed with Zustand. Right now the payment step calls Stripe directly from the frontend which is insecure — I need to move it to a server-side endpoint. Create a POST /api/checkout endpoint that takes the cart items, creates a Stripe session, and returns the session URL."

That is about 30 seconds of speaking. Typing it would take over a minute, and most developers would skip half the context.

Describing bugs

Bugs are inherently narrative — what happened, what should have happened, what you already tried. This maps perfectly to speech:

"When I click the save button on the settings page and the network request is slow, the loading spinner appears but if I navigate away before it completes and then come back, the old settings are shown even though the save actually succeeded on the backend. I think the issue is that we're reading from a stale local cache instead of re-fetching after navigation."

Code review comments

Code review is where many developers shift from terse typed comments to richer feedback. Voice removes the friction:

"This function is doing three things — fetching the user, checking permissions, and formatting the response. I'd split the permission check into its own middleware so we can reuse it on the admin routes. Also the error handling on line 42 swallows the original error message which makes debugging harder in production."

Architecture decisions

When you need to think through an approach, voice is faster than typing and more organized than just thinking in your head:

"I'm trying to decide between WebSockets and server-sent events for the real-time notifications. WebSockets give us bidirectional communication but we only need server-to-client for notifications. SSE is simpler, works through proxies and load balancers more reliably, and we can use a simple EventSource on the frontend. The tradeoff is that if we ever need the client to send messages back we'd have to add a separate endpoint. What do you think?"

Voice + screenshot, fully hands-free

Voice prompts work well on their own, but coding often needs visual context — an error message, a UI bug, a chart, a diagram on a colleague's screen. Vext handles this with a feature no other voice-to-text tool offers: capture a screenshot during hands-free dictation, and the image pastes into the AI tool alongside your transcribed prompt.

The flow:

Start hands-free dictation
Speak your prompt: "Look at this layout — the sidebar is overlapping the main content on narrow viewports. Fix the flexbox so it collapses cleanly."
Drag to capture the bug on screen
Press the dictation key to stop

Both the transcribed text and the screenshot land in Claude Code (or Cursor, or ChatGPT) at your cursor. With YOLO Mode enabled, the prompt submits automatically. You never touch the keyboard.

Use cases where this beats typing:

Showing an error message — capture the stack trace instead of describing it
UI bugs — show what's broken while explaining the expected behavior
Reviewing a colleague's code — capture the diff while talking through your suggestion
Chart and diagram analysis — point Claude at a Grafana panel or architecture diagram and ask questions
Cross-app context — describe a Figma mockup while implementing it in your editor

This combines the three features that make Vext useful for AI coding: hands-free dictation, screenshot capture, and YOLO Mode auto-submission. Together they let you stay in flow with the AI without breaking to type or paste.

Per-tool tips

Claude Code (terminal)

Claude Code handles natural language well — no need to format your prompts carefully. For multi-step tasks, use voice for the initial description, then type short follow-ups ("yes", "try a different approach", "revert that").

Cursor

Use voice for the composer panel. Long prompts with full context work significantly better than short instructions. Cursor uses the prompt to search your codebase for relevant context, so more detail means better file selection.

ChatGPT / Claude.ai

Voice works especially well for chat-based interfaces where conversation flow matters. Use voice for the substantive messages and type for quick replies.

Common concerns

"Will the AI understand my messy speech?"

Yes. Large language models handle filler words, restarts, and conversational phrasing without issue. A rambling 100-word spoken prompt with full context outperforms a polished 15-word typed instruction almost every time.

"What about code snippets in prompts?"

Type those. Voice is for the natural language parts — descriptions, context, requirements. When you need to include a specific code snippet, type it or paste it separately.

"Is it awkward to talk to your computer?"

For about 30 minutes. After that, typing prompts starts to feel like the slow way.

Getting started

Download Vext — free trial, no account required. Enable YOLO Mode and try voice-prompting your next coding session.