If you're picking a local speech recognition engine on Mac, the choice usually comes down to two: OpenAI Whisper and NVIDIA Parakeet. Both run well on Apple Silicon, both are open. They make different trade-offs, and the right pick depends on what you're transcribing.

This is a straight comparison based on benchmarks I've run on M2 and M3 Macs.

The short version

  • Parakeet is faster and uses less RAM, but English-only.
  • Whisper Large-v3 is more accurate on hard audio and handles 99+ languages.
  • For English dictation: Parakeet wins.
  • For meetings, files, or multilingual content: Whisper.

The gap is smaller than people think. Both are good enough that most users won't notice the accuracy difference on clean audio.

What each one is

OpenAI Whisper is an encoder-decoder transformer trained on 680,000 hours of multilingual speech. Released open-weight in 2022, with v2 and v3 following. Sizes range from Tiny (75 MB) to Large-v3 (3 GB).

NVIDIA Parakeet is an RNN-T model — recurrent neural network transducer. NVIDIA released it through NeMo. It's smaller, faster, and English-only by default (multilingual variants exist but are less mature).

The architectural difference matters: Whisper processes 30-second windows with a transformer that's expensive but flexible. Parakeet streams audio through an RNN that produces text incrementally and cheaply.

Speed

Speed is measured as real-time factor (RTF). 1x means the model takes as long as the audio itself. 10x means it processes a 10-minute file in 1 minute. Higher is faster.

Benchmarks on M2 (8-core GPU, 16 GB RAM), measured against the LibriSpeech test-clean set:

Engine Model RTF (M2) RTF (M3 Pro)
Whisper Tiny 30x 45x
Whisper Base 20x 32x
Whisper Small 10x 18x
Whisper Medium 5x 9x
Whisper Large-v3 2x 4x
Parakeet TDT-1.1B 150x 220x

Parakeet is roughly 20–50x faster than the equivalent-accuracy Whisper model. For dictation this is the difference between text appearing instantly and waiting half a second.

Accuracy

Word error rate (WER) on standard English benchmarks. Lower is better. These numbers vary across test sets — what follows is from LibriSpeech test-clean, which is a relatively clean read-speech corpus. On harder audio (noisy, accented, technical) the numbers go up for both.

Engine WER (LibriSpeech) WER (CommonVoice)
Whisper Tiny 9.0% 14%
Whisper Base 7.0% 11%
Whisper Small 5.5% 8%
Whisper Medium 4.8% 7%
Whisper Large-v3 4.2% 5.5%
Parakeet TDT-1.1B 4.5% 6.5%

On clean English, Parakeet matches Whisper Medium and approaches Whisper Large-v3. The gap is small. On noisy or accented English, Whisper Large-v3 holds its lead more clearly.

For multilingual content, Whisper is the only real option. Parakeet's multilingual variants exist but I haven't seen them match Whisper Large on languages outside English.

RAM

Apple Silicon Macs have unified memory, and the model loads into the same pool as everything else. RAM use matters if you have 8 or 16 GB and want to keep using your machine while transcribing.

Engine Model RAM (loaded)
Whisper Tiny ~400 MB
Whisper Base ~500 MB
Whisper Small ~1 GB
Whisper Medium ~2.5 GB
Whisper Large-v3 ~5 GB
Parakeet TDT-1.1B ~1.2 GB

If you're on 8 GB and want to keep VS Code, a browser, and Slack open, Whisper Large-v3 is rough. Parakeet at 1.2 GB or Whisper Small at 1 GB are the practical options at that memory tier.

On 16 GB you can run anything comfortably. On 32 GB and up you don't even think about it.

Latency for dictation

Speed and RTF tell you throughput on long files. For dictation, what matters is how quickly the first word appears after you stop talking.

Measured on M2, 5-second utterance, mic to text:

Engine First-token latency Full result
Whisper Tiny 180 ms 250 ms
Whisper Small 350 ms 500 ms
Whisper Medium 700 ms 1100 ms
Whisper Large-v3 1400 ms 2200 ms
Parakeet TDT-1.1B 80 ms 150 ms

Parakeet's streaming output makes it feel instant. Whisper Tiny and Small are also fast enough to feel responsive. Anything Medium or larger introduces a noticeable wait — fine for files, less fine for dictation.

When to pick which

Use Parakeet if:

  • You dictate primarily in English
  • You want the lowest possible latency
  • You're on a Mac with limited RAM
  • You're transcribing long files and want them done quickly

Use Whisper Small or Medium if:

  • You need multilingual support (99+ languages)
  • You want accuracy without the RAM hit of Large-v3
  • You're on 16 GB and want a balanced choice

Use Whisper Large-v3 if:

  • You're transcribing meetings or important files where every error costs you
  • You have 32 GB+ and don't care about RAM
  • You're working with noisy audio, heavy accents, or technical vocabulary
  • The job runs offline anyway, so RTF doesn't matter much

What about cloud-equivalent accuracy?

The cloud services (OpenAI Whisper API, Deepgram Nova-2, Google Speech-to-Text) usually report 3.5–4.5% WER on standard benchmarks. That's roughly Whisper Large-v3 territory.

The accuracy gap between local and cloud is real but small — usually 0.5–1% WER on clean audio, more on hard audio. For most use cases (dictation, meetings, notes), it's not noticeable. Cloud services win on edge cases: heavy accents you don't have model coverage for, rare technical vocabulary, very low-quality audio.

Apps and which engines they use

If you don't want to think about engines, here's what mainstream Mac apps default to:

  • Vext — Parakeet by default, Whisper available as an option
  • MacWhisper — Whisper, model selectable
  • Superwhisper — Whisper, model selectable
  • VoiceInk — Whisper
  • FluidVoice — Parakeet support
  • Apple Dictation — Apple's own foundation model (not Whisper or Parakeet)

The split between "Parakeet by default" and "Whisper by default" usually reflects whether the app is dictation-first (Parakeet) or file-transcription-first (Whisper).

The bottom line

For most people, on a current Mac, dictating in English: Parakeet. The latency feels different — text appears as you speak rather than after you finish.

For meetings, files, or multilingual work: Whisper Medium or Large-v3.

You can have both. Most apps let you pick per task.