用户用四块RTX 4090打造Jarvis级语音助手,称Gemma 4 31B QAT模型表现最佳
英文摘要
A local AI enthusiast built a personal voice assistant with premium capabilities including voice verification, wake words, continuous conversation, Home Assistant control, Hermes Agent integration, and deep research. The system runs on a custom server with four modified RTX 4090s (192GB VRAM total), 128GB DDR5 RAM, and a 3000W PSU powered via a 240V/30A dryer line. After testing large models like Qwen 397B, MiniMax M3, Nemotron 3 Ultra, and GLM 4.7/5.2, the user found that Google's Gemma 4 31B QAT outperforms them all and is significantly faster for its size. The assistant is deployed across the house using conference speaker-mics, with heat managed by a laundry room exhaust fan.
中文摘要
一名本地AI爱好者打造了一款私人语音助手,具备语音验证、唤醒词、持续对话、Home Assistant控制、Hermes Agent集成和深度研究等高级功能。系统运行在一台定制服务器上,配备四块改造后的RTX 4090(共192GB显存)、128GB DDR5内存和3000W电源,通过240V/30A烘干机线路供电。在测试了Qwen 397B、MiniMax M3、Nemotron 3 Ultra和GLM 4.7/5.2等大模型后,用户发现谷歌的Gemma 4 31B QAT表现优于所有这些模型,且在同尺寸下速度惊人。该助手通过会议扬声器麦克风部署全家,热量由洗衣房排风扇管理。
关键要点
Custom hardware: 4 modded RTX 4090s with 48GB each (192GB VRAM), 128GB DDR5, 3000W PSU, powered by a dryer line.
定制硬件:四块改造后的RTX 4090(每块48GB,共192GB显存),128GB DDR5内存,3000W电源,由烘干机线路供电。
The personal assistant features voice verification, wake words, continuous conversation, long-term memory, dynamic system prompt, Home Assistant and Hermes Agent integration, and deep research.
私人助手具备语音验证、唤醒词、持续对话、长期记忆、动态系统提示、Home Assistant和Hermes Agent集成以及深度研究功能。
Among tested models, Gemma 4 31B QAT was found to outperform Qwen 397B, MiniMax M3, Nemotron 3 Ultra, GLM 4.7, and a lobotomized GLM 5.2, while being shockingly fast for its size.
在测试的模型中,Gemma 4 31B QAT被发现在性能上超越了Qwen 397B、MiniMax M3、Nemotron 3 Ultra、GLM 4.7和一个大幅阉割的GLM 5.2,并且同尺寸下速度惊人。
The setup is noisy and hot but managed with an exhaust fan; idle cards consume power comparable to half a hand dryer, full load like 2-3 hand dryers.
设备噪音大且发热高,但通过排风扇管理;待机功耗相当于半个干手器,满载相当于2-3个干手器。