Skip to content

Voice Input

Voice input lets you dictate messages into the chat input bar using a local speech recognition model. Everything runs on-device — no audio is sent to external servers.

Voice input is disabled by default. To enable it:

  1. Open Settings → Voice Input.
  2. Toggle Enable Voice Input. Magia will request microphone permission from the OS on first enable.
  3. Download at least one model (Parakeet TDT is recommended).

Once enabled, a microphone button appears in the chat input bar.

  • Start recording: Click the microphone button or press the configured keyboard shortcut (customizable in Settings → Keybindings under chat.toggle-voice-recording).
  • Stop and transcribe: Click the button again, press the shortcut, or press Enter. The recording stops and transcription begins.
  • Send immediately: Press Enter while recording to stop and send the transcribed text in one step.
  • Cancel: Press Escape to discard the recording without transcribing.

While recording, a live AudioWaveform visualization animates in the input bar using a Web Audio AnalyserNode. A seconds counter shows how long you have been recording. Partial transcription results appear in real time as the engine processes audio.

Transcribed text is inserted into the input bar at the cursor. If text was already present, the transcription is appended with a space.

Magia ships two on-device speech-to-text engines. The active engine is shown in Settings → Voice Input → Active Engine.

Parakeet TDT is the primary engine. It is a fast, accurate local model (~670 MB). Magia uses it automatically when the model is downloaded.

  • Download: Settings → Voice Input → Parakeet TDT → Download
  • Delete: same panel → Delete

Download progress is shown as a percentage bar driven by stt-download-progress events from the backend.

Whisper (via whisper-rs) is used as a fallback when Parakeet is not available. Four model sizes are available:

ModelSizeNotes
Tiny~75 MBFastest, lowest accuracy
Base~142 MBGood balance (default selection)
Small~466 MBBetter accuracy
Medium~1.5 GBHighest accuracy, slower

Each model can be downloaded and deleted independently. Download progress is shown per-model.

In normal use, the engine is chosen automatically (Parakeet if installed, otherwise Whisper). In Developer Mode, a dropdown in Settings lets you force a specific engine (auto, parakeet, or whisper).

The Language setting controls which language the engine expects. The default is Auto-detect, which lets the model infer the language from the audio. You can also pin a specific language from a list of 90+ options (Arabic, Chinese, French, German, Spanish, and many more).

The language setting is stored as a BCP 47 language code in whisperLanguage and applies to both engines.

The Microphone picker lists all available audio input devices. Select the device you want to use for recording. The list refreshes automatically when a new device is connected. On first enable, Magia requests OS microphone permission and then re-enumerates devices so that full device labels are available.

The selected device ID is persisted in whisperDeviceId.

SettingKeyDefault
Enable voice inputvoiceInputEnabledfalse
Whisper model sizewhisperModelbase
LanguagewhisperLanguageauto
Microphone devicewhisperDeviceId(system default)
Engine overridesttEngineauto