Overview
When you finish recording, HyperWhisper can detect which portions of the audio contain speech. Silence is stripped out, producing a trimmed version of your audio that goes to the transcription provider instead of the full original. VAD is disabled by default on macOS. You opt in through Settings.How It Works
VAD uses the Silero VAD model, bundled with the macOS app (~864 KB). Processing happens entirely on your device. When VAD runs:- VAD is enabled in Settings, and
- The recording is at least 30 seconds long
Detect speech segments
The Silero VAD model scans your audio and identifies which portions contain speech.
Remove silence
All detected speech segments are extracted and concatenated into a trimmed audio file. Any gap between segments larger than 200 ms is removed — this includes leading and trailing silence as well as mid-recording pauses.
Validate the result
HyperWhisper checks that the trimmed file meets minimum quality thresholds (see Validation & Quality Checks below). If it doesn’t, the original audio is used instead.
Benefits
- Lower API costs — you’re billed for less audio when silence is removed.
- Faster transcription — smaller files upload and process more quickly.
- Potentially better accuracy — less background noise and silence for the model to work through.
- Useful for any recording with pauses — interviews, dictation with thinking gaps, or recordings started early and stopped late.
Enabling VAD
- macOS
- Windows
- iOS
- Open Settings (click the menu bar icon → Settings).
- Go to the Sound section.
- Turn on Remove silence before transcription.
Viewing Original vs. Trimmed Audio
- macOS
- Windows
When a recording was trimmed, the History detail view shows an Original / Trimmed toggle above the audio player. Select Trimmed to hear the version with silence removed, or Original to hear the full recording.Both files are stored on disk. Switching the toggle changes only which one plays — nothing is deleted.See Viewing Transcription History for more about the audio player.
Validation & Quality Checks
After trimming, HyperWhisper validates the result before using it. If any check fails, the original audio is used for transcription automatically — you will not see an error.| Check | Threshold | Why |
|---|---|---|
| Silence removed | More than 0.5 seconds | Avoids unnecessary file duplication when there is almost no silence |
| Trimmed duration | At least 0.3 seconds | Guards against over-aggressive trimming that would remove actual speech |
| Trimmed file size | More than 5 KB | Ensures the output file contains real audio content, not just a WAV header |
Failures during VAD processing are logged as breadcrumbs to Sentry to help diagnose edge cases. The original audio is always the fallback, so a VAD failure never blocks transcription.
Platform Support
| Platform | Status |
|---|---|
| macOS | Fully supported — enable in Settings → Sound → Remove silence before transcription |
| Windows | No silence trimming; no user-facing toggle |
| iOS | Amplitude-based leading/trailing silence trimming runs automatically on every recording |
