Skip to main content
Performance in HyperWhisper comes down to three choices: where the transcription runs (cloud vs. on-device), which model you pick for your hardware, and a handful of settings that affect startup time, audio processing, and storage. This page walks through each one.

Choose your transcription strategy

Cloud — fastest results, highest accuracy

HyperWhisper Cloud routes to best-in-class providers and requires no setup. Results come back from the server faster than most on-device models can load and process the same audio.
TierPowered byBest for
HighestElevenLabs Scribe v2Accents, noisy audio, technical vocabulary
HighGrok STT (xAI)High-accuracy multilingual transcription
MediumDeepgram Nova-3English accuracy with low latency
MediumGroq Whisper Large v3Sub-second latency for English and major European languages
Cost note: HyperWhisper Cloud detects silence and empty audio automatically. If a recording contains no detectable speech you are charged 0 credits — pauses, dead air, and accidentally triggered empty recordings don’t cost anything. Across a typical push-to-talk workday, you only pay for the minutes you actually spoke.

On-device — offline, private, no per-minute cost

Local models run entirely on your machine. Audio never leaves your device, and there is no per-minute charge once the model is downloaded. The trade-off is slightly lower accuracy than the top cloud tiers, and a one-time download of 350 MB–3.1 GB depending on the model.
On-device streaming transcription is available on both macOS and Windows. Parakeet V2 and Parakeet V3 stream on both platforms; Nemotron 3.5 Streaming (Multilingual) also runs as a streaming transducer on Windows. See the Models page for the full platform matrix.

Pick the right model size for your hardware

Apple Silicon Macs (M1 and later)

Local models run via Metal GPU + Neural Engine acceleration. The Small Whisper model (466 MB on macOS, ~2 GB VRAM) is comfortably realtime even on an M1 Air and is the recommended starting point for most users. For English-only work, Parakeet V2 (474 MB) is typically faster than equivalent Whisper sizes. For the broadest offline language coverage (Chinese, Japanese, Korean, Arabic), Nemotron 3.5 Multilingual (~1.3 GB) is the only local model that reaches beyond European languages.

Intel Macs

Local models work on Intel but use CPU only — no GPU or Neural Engine acceleration. Start with Whisper Tiny (~39 MB) or Whisper Base (148 MB). If those feel too slow or inaccurate, switch to HyperWhisper Cloud, which offloads all the compute to the server.

Windows x64

Windows local transcription uses DirectCompute (Whisper) and DirectML (Parakeet) for GPU acceleration. Any modern NVIDIA, AMD, or Intel GPU with DirectX 11 support qualifies. If no compatible GPU is detected, the model falls back to CPU automatically — it still works, just slower.
On ARM64 / Snapdragon Windows devices, Whisper is not supported yet. Use Parakeet V2 (English) or Parakeet V3 (25 European languages) for local transcription on ARM64 — both run with DirectML acceleration.

Whisper size ladder

ModelSizeRecommended VRAMCharacter
Tiny~39 MB (macOS) / ~78 MB (Windows)~1 GBLowest-end machines, quick drafts
Base148 MB~1 GBLight hardware, basic dictation
Small466 MB (macOS) / 488 MB (Windows)~2 GBBest balance for most users
Medium1.5 GB~5 GBHigher accuracy, mid-range GPUs
Large v3 Turbo809 MB (macOS) / 1.5 GB (Windows)~6 GBNear-Large accuracy, much faster
Large v2 / Large v33.1 GB~10 GBHighest Whisper accuracy
Smaller models are faster but less accurate; larger models are slower but handle difficult audio better. If your GPU doesn’t have enough VRAM, the model falls back to CPU automatically.
If you only ever dictate in one language, the English-only Whisper variants (.en) produce slightly better results at the same model size. You give up multilingual support in exchange.

Reduce push-to-talk startup latency

Enable “Keep Microphone Warm”

The biggest source of push-to-talk delay is the microphone starting up cold — this is especially noticeable with Bluetooth headsets, which can take a second or more to switch into call mode. Keep Microphone Warm keeps a quiet idle audio session open between recordings so the microphone is ready the moment you press your shortcut.
Enable it in Settings → Sound → Keep Microphone Warm.
While this setting is on, macOS shows the orange microphone indicator in the menu bar at all times — even when you are not recording. Bluetooth headsets may also stay in their lower-quality call audio profile rather than switching back to stereo. Turn this off if either trade-off matters to you.

Deepgram Fast Formatting (streaming)

When using Deepgram for streaming transcription, Fast Formatting is on by default. It returns smart-formatted results immediately without waiting for surrounding context, which minimises the delay before words appear on screen. Turning it off produces slightly more accurate punctuation and number formatting at the cost of extra latency. Leave it on unless formatting precision matters more than speed for your workflow.

Optimize file transcription

Enable VAD (Voice Activity Detection)

Remove silence before transcription (in Settings → Sound) analyzes the clip after you stop recording and strips leading and trailing silence using an AI voice-detection model (Silero VAD) before sending audio to the provider.Why it helps:
  • Reduces the amount of audio sent to cloud providers, which can lower API costs.
  • Speeds up transcription, especially for short clips with long pauses at the start or end.
  • May improve accuracy by removing noise-only segments.
Leave it off if your recordings consistently start and end with speech — VAD adds a small processing step with no benefit in that case.

Sample rate

The default audio sample rate for file transcription is 16 000 Hz, which is the standard input rate for Whisper-family models and what most cloud providers expect. There is no benefit to raising it for transcription purposes — 16 kHz is the sweet spot.
This setting applies to macOS file/push-to-talk recordings. Streaming transcription uses a fixed rate set by the streaming provider, not this setting.

Post-processing performance

Post-processing is an optional second step that cleans up filler words, fixes punctuation, and applies formatting after transcription. It adds latency — keep that in mind when speed is the priority.

Local Gemma models (offline, no extra cost)

Local Gemma post-processing runs via Metal GPU acceleration on Apple Silicon Macs (M1 and later). Intel Macs do not support local LLM post-processing — use a cloud post-processing provider instead.
ModelSizeRecommended RAMBest for
Gemma 4 E2B (Recommended)3.1 GB~4 GBBest balance of speed and quality for most Macs
Gemma 4 E4B5 GB~6 GBHigher quality cleanup
Gemma 4 12B7.1 GB~10 GBMid-size dense model; good for 16 GB Macs
Gemma 4 26B MoE16.9 GB~18 GBMixture-of-experts for capable machines
Gemma 4 31B Dense18.3 GB~20 GBHighest local quality, slowest
Start with Gemma 4 E2B — it fits comfortably in 4 GB and handles most cleanup tasks well.

Cloud post-processing

If local Gemma is too large for your machine, every cloud post-processing provider (HyperWhisper Cloud, OpenAI, Claude, Gemini, Groq, and others) is available as an alternative with no local storage requirement. Each is labeled with a speed and accuracy rating in the Model Library.

Storage and disk space

ItemApproximate size
App (macOS)~200 MB
App (Windows)~300 MB
Whisper Tiny~39 MB (macOS) / ~78 MB (Windows)
Whisper Small~466 MB (macOS) / ~488 MB (Windows)
Whisper Large v3~3.1 GB
Parakeet V2~474 MB
Nemotron 3.5 Multilingual~1.3 GB
Gemma 4 E2B (post-processing)~3.1 GB
Gemma 4 31B Dense (post-processing)~18.3 GB
Keep audio files is off by default — audio is discarded once transcription succeeds to save disk space. Turn it on only if you need playback from History or want to retry failed transcriptions. Max recording duration defaults to 300 seconds (5 minutes).

Quick-pick guide

Not sure where to start? Find your goal below.
GoalRecommended setup
Lowest latency (words appear as fast as possible)HyperWhisper Cloud Medium (Groq Whisper Large v3) — or on-device Parakeet V2 + Keep Microphone Warm
Best accuracy for difficult audioHyperWhisper Cloud Highest (ElevenLabs Scribe v2)
Fully offline, no networkNemotron 3.5 Multilingual or Whisper Small/Large (macOS + Windows) + on-device Gemma 4 E2B post-processing
Lowest costHyperWhisper Cloud Medium (Groq Whisper Large v3, ~$0.11/hr) or on-device models (free after download)
Older or low-end machineWhisper Tiny or Base locally, or HyperWhisper Cloud to offload compute
Best all-round on Apple SiliconWhisper Small + Gemma 4 E2B post-processing
ARM64 Windows deviceParakeet V2 (English) or Parakeet V3 (multilingual) — Whisper is x64-only

  • Models — full model library, VRAM requirements, and speed/accuracy ratings
  • System Requirements — platform-specific hardware specs
  • Providers — cloud tier pricing, silence-free billing, and cost examples
  • Best Practices — accuracy tips covering vocabulary, microphone hardware, and environment