Voice Input

Voice input lets you dictate messages into the chat input bar using a local speech recognition model. Everything runs on-device — no audio is sent to external servers.

Enabling voice input

Voice input is disabled by default. To enable it:

Open Settings → Voice Input.
Toggle Enable Voice Input. Magia will request microphone permission from the OS on first enable.
Download at least one model (Parakeet TDT is recommended).

Once enabled, a microphone button appears in the chat input bar.

Recording a message

Start recording: Click the microphone button or press the configured keyboard shortcut (customizable in Settings → Keybindings under chat.toggle-voice-recording).
Stop and transcribe: Click the button again, press the shortcut, or press Enter. The recording stops and transcription begins.
Send immediately: Press Enter while recording to stop and send the transcribed text in one step.
Cancel: Press Escape to discard the recording without transcribing.

While recording, a live AudioWaveform visualization animates in the input bar using a Web Audio AnalyserNode. A seconds counter shows how long you have been recording. Partial transcription results appear in real time as the engine processes audio.

Transcribed text is inserted into the input bar at the cursor. If text was already present, the transcription is appended with a space.

Speech engines

Magia ships two on-device speech-to-text engines. The active engine is shown in Settings → Voice Input → Active Engine.

Parakeet TDT (default)

Parakeet TDT is the primary engine. It is a fast, accurate local model (~670 MB). Magia uses it automatically when the model is downloaded.

Download: Settings → Voice Input → Parakeet TDT → Download
Delete: same panel → Delete

Download progress is shown as a percentage bar driven by stt-download-progress events from the backend.

Whisper (fallback)

Whisper (via whisper-rs) is used as a fallback when Parakeet is not available. Four model sizes are available:

Model	Size	Notes
Tiny	~75 MB	Fastest, lowest accuracy
Base	~142 MB	Good balance (default selection)
Small	~466 MB	Better accuracy
Medium	~1.5 GB	Highest accuracy, slower

Each model can be downloaded and deleted independently. Download progress is shown per-model.

Engine selection

In normal use, the engine is chosen automatically (Parakeet if installed, otherwise Whisper). In Developer Mode, a dropdown in Settings lets you force a specific engine (auto, parakeet, or whisper).

Language selection

The Language setting controls which language the engine expects. The default is Auto-detect, which lets the model infer the language from the audio. You can also pin a specific language from a list of 90+ options (Arabic, Chinese, French, German, Spanish, and many more).

The language setting is stored as a BCP 47 language code in whisperLanguage and applies to both engines.

Microphone selection

The Microphone picker lists all available audio input devices. Select the device you want to use for recording. The list refreshes automatically when a new device is connected. On first enable, Magia requests OS microphone permission and then re-enumerates devices so that full device labels are available.

The selected device ID is persisted in whisperDeviceId.

Settings summary

Setting	Key	Default
Enable voice input	`voiceInputEnabled`	`false`
Whisper model size	`whisperModel`	`base`
Language	`whisperLanguage`	`auto`
Microphone device	`whisperDeviceId`	(system default)
Engine override	`sttEngine`	`auto`