Voice Input
Voice input lets you dictate messages into the chat input bar using a local speech recognition model. Everything runs on-device — no audio is sent to external servers.
Enabling voice input
Section titled “Enabling voice input”Voice input is disabled by default. To enable it:
- Open Settings → Voice Input.
- Toggle Enable Voice Input. Magia will request microphone permission from the OS on first enable.
- Download at least one model (Parakeet TDT is recommended).
Once enabled, a microphone button appears in the chat input bar.
Recording a message
Section titled “Recording a message”- Start recording: Click the microphone button or press the configured keyboard shortcut (customizable in Settings → Keybindings under
chat.toggle-voice-recording). - Stop and transcribe: Click the button again, press the shortcut, or press
Enter. The recording stops and transcription begins. - Send immediately: Press
Enterwhile recording to stop and send the transcribed text in one step. - Cancel: Press
Escapeto discard the recording without transcribing.
While recording, a live AudioWaveform visualization animates in the input bar using a Web Audio AnalyserNode. A seconds counter shows how long you have been recording. Partial transcription results appear in real time as the engine processes audio.
Transcribed text is inserted into the input bar at the cursor. If text was already present, the transcription is appended with a space.
Speech engines
Section titled “Speech engines”Magia ships two on-device speech-to-text engines. The active engine is shown in Settings → Voice Input → Active Engine.
Parakeet TDT (default)
Section titled “Parakeet TDT (default)”Parakeet TDT is the primary engine. It is a fast, accurate local model (~670 MB). Magia uses it automatically when the model is downloaded.
- Download: Settings → Voice Input → Parakeet TDT → Download
- Delete: same panel → Delete
Download progress is shown as a percentage bar driven by stt-download-progress events from the backend.
Whisper (fallback)
Section titled “Whisper (fallback)”Whisper (via whisper-rs) is used as a fallback when Parakeet is not available. Four model sizes are available:
| Model | Size | Notes |
|---|---|---|
| Tiny | ~75 MB | Fastest, lowest accuracy |
| Base | ~142 MB | Good balance (default selection) |
| Small | ~466 MB | Better accuracy |
| Medium | ~1.5 GB | Highest accuracy, slower |
Each model can be downloaded and deleted independently. Download progress is shown per-model.
Engine selection
Section titled “Engine selection”In normal use, the engine is chosen automatically (Parakeet if installed, otherwise Whisper). In Developer Mode, a dropdown in Settings lets you force a specific engine (auto, parakeet, or whisper).
Language selection
Section titled “Language selection”The Language setting controls which language the engine expects. The default is Auto-detect, which lets the model infer the language from the audio. You can also pin a specific language from a list of 90+ options (Arabic, Chinese, French, German, Spanish, and many more).
The language setting is stored as a BCP 47 language code in whisperLanguage and applies to both engines.
Microphone selection
Section titled “Microphone selection”The Microphone picker lists all available audio input devices. Select the device you want to use for recording. The list refreshes automatically when a new device is connected. On first enable, Magia requests OS microphone permission and then re-enumerates devices so that full device labels are available.
The selected device ID is persisted in whisperDeviceId.
Settings summary
Section titled “Settings summary”| Setting | Key | Default |
|---|---|---|
| Enable voice input | voiceInputEnabled | false |
| Whisper model size | whisperModel | base |
| Language | whisperLanguage | auto |
| Microphone device | whisperDeviceId | (system default) |
| Engine override | sttEngine | auto |