Most meeting transcription tools send your audio to a server. Your conversation — confidential business discussions, personnel matters, client calls — gets processed and stored on third-party infrastructure.

If that concerns you, there's an alternative. Apple Silicon Macs can run speech recognition models locally that rival cloud services in accuracy. Here's how to set up local meeting transcription on macOS.

What you need

  • Apple Silicon Mac (M1, M2, M3, M4)
  • macOS 14 Sonoma or later
  • A local transcription app (this guide uses Vext)

How it works

When you start a meeting recording in Vext, two audio streams are captured simultaneously:

  1. Microphone — your voice, captured via the standard AVAudioEngine
  2. System audio — everyone else in the meeting, captured via macOS Core Audio process tap (available on macOS 14.2+)

This means Vext works with any meeting app — Zoom, Google Meet, FaceTime, Microsoft Teams, Discord, or any other application that produces audio output. No plugins, no bot joining your call, no meeting app integration needed.

When you stop the recording:

  1. Audio is segmented using Voice Activity Detection (VAD) — silent gaps are identified to split the audio into natural speech chunks
  2. Each chunk is transcribed locally using the Parakeet engine at 150x realtime
  3. Speaker labels are applied — "Me" for microphone audio, "Them" for system audio
  4. If Enhance is enabled, the transcript is cleaned up and optionally translated
  5. If Summarize is enabled, an AI summary with key points and action items is generated

Everything happens on your Mac. Nothing leaves the device.

Setting it up

Step 1: Install Vext

brew install muvon/tap/vext

Or download from getvext.app. The free trial includes 10 meeting recordings.

Step 2: Grant permissions

On first launch, Vext requests three permissions:

  • Microphone — to capture your voice
  • Accessibility — for the global hotkey system
  • Screen Recording — required by macOS for system audio capture (the process tap API requires this permission even though no screen content is recorded)

Step 3: Start a recording

Press the Fn key to toggle meeting recording. A pulsing red dot appears near your cursor and the menu bar icon blinks red to indicate recording is active.

Join your Zoom, Meet, or FaceTime call as usual. Vext captures both sides of the conversation in the background.

Step 4: Stop and review

Press Fn again to stop. Vext processes the audio — usually in a few seconds for a 30-minute meeting at 150x realtime transcription speed.

You get:

  • Full transcript with speaker labels and timestamps
  • AI summary with key points (if enabled)
  • Action items extracted from the discussion (if enabled)

Capturing screenshots during meetings

While recording a meeting, you can capture any area of your screen. Drag to select a region — the screenshot is automatically attached to your transcript.

This is useful for:

  • Slides from a presentation
  • Code or designs being discussed
  • Diagrams on a shared whiteboard
  • Any visual context that complements the spoken content

Multiple screenshots per meeting, all saved alongside the transcript.

Export options

Transcripts can be exported in several formats:

Format Use case
TXT Simple text, easy to paste anywhere
Markdown Formatted with speaker labels and timestamps
SRT Subtitles for video editing
VTT Web subtitles (HTML5 video)

Tips for better transcription quality

Use a good microphone. The built-in Mac mic is adequate in quiet environments, but a headset or external mic significantly improves accuracy — especially when your meeting audio is playing through speakers and could create feedback.

Reduce background noise. Close windows, mute notifications, and avoid typing during important sections. The VAD system handles silence well, but continuous background noise degrades transcription accuracy.

Let people finish speaking. Overlapping speech is the hardest scenario for any transcription system. When speakers take turns clearly, accuracy improves significantly.

Check your system audio setup. If meeting audio is not appearing in the transcript, verify that the screen recording permission is granted and that your meeting app is outputting audio through the default system output device.

Privacy comparison

Aspect Cloud transcription Local transcription
Audio sent to server Yes No
Stored on third party Usually Never
Works offline No Yes
Third-party data policies Apply N/A
Compliance (HIPAA, etc.) Varies by vendor Your device, your control

For organizations in regulated industries — healthcare, legal, finance — local transcription eliminates an entire category of compliance risk. The data never leaves the device, so there's no third-party data processing agreement to negotiate.

Accuracy

Using the Parakeet engine, Vext achieves a word error rate comparable to leading cloud services — approximately 4–5% on general English speech. Technical vocabulary and non-English languages may see higher error rates depending on the source material.

For critical meetings where accuracy matters most, review the transcript after the meeting. The combination of local transcription speed (near-instant) and AI cleanup (Enhance) means the review process is fast — you're checking, not transcribing from scratch.

Download Vext — 10 free meeting recordings, no account, no credit card. Works with any meeting app on macOS 14+.