Omi Med STT v1: Fine-Tuned Parakeet 0.6B for Medical ASR Released with Open Weights and Local Runtime
Omi Health founder released Omi Med STT v1, an open-weight (CC-BY-4.0) fine-tune of NVIDIA Parakeet TDT 0.6B v2 specialized for medical speech, with a local runtime that auto-selects backends (MLX on Apple Silicon, NeMo on CUDA, GGUF on CPU). On a held-out benchmark of 1,513 medical clips (7.18 hours), it achieves a medical word error rate (M-WER) of 2.37% and overall WER 8.30% while running at 145× realtime on an A10, significantly outperforming the base model and most open local ASR options. The model trails only VibeVoice-ASR 9B on M-WER but beats it on WER and speed, and rivals cloud-based medical transcription services such as ElevenLabs Scribe v2 (M-WER 1.39%) and AssemblyAI (1.81%) with the structural latency advantage of on-device processing. Training used 127 hours of audio (71% real, 29% synthetic), and the benchmark confirmed zero overlap with training data; key weaknesses are drug name accuracy (4.75% drug WER) targeted for improvement in v2.