Transcribe audio. Nothing leaves your tab.

Drop in an audio file. Whisper runs on your device — model gets downloaded once, cached forever, and inference happens locally with WebGPU when available. Export as TXT, SRT, VTT, or JSON.

100% local No upload No signup WASM fallback

Drop an audio file or click to choose MP3 · WAV · M4A · FLAC · OGG · WebM · MP4 — up to ~30 min for tiny model

Model

How private is this, really?

The Whisper model is downloaded from HuggingFace's CDN the first time you transcribe. After that, it's cached in your browser's IndexedDB — subsequent transcriptions load instantly and work offline. Your audio file is decoded by the browser, fed to the model running in your tab, and never sent anywhere.

On Apple Silicon with Chrome or Safari Tech Preview, WebGPU acceleration runs at ~0.3–0.5× real-time for the Tiny model. Without WebGPU, the WASM fallback runs at ~1.5–3× real-time (slower than playback but still under a minute for a 5-minute clip).

Want this as a Mac app — not a tab?

Vext runs Whisper natively on Apple Silicon — ~5–10× faster than browser WASM, with no model download wait. Plus, it transcribes meetings live, ducks system audio, and types directly into any app. $49 once, all local.

Try Vext — $49