NVIDIA发布NVFP4量化版DiffusionGemma 26B A4B IT模型
英文摘要
Google DeepMind’s DiffusionGemma 26B A4B IT is an open-weights multimodal model that uses discrete diffusion to generate text from text, image, and video inputs. It has 25.2B total parameters and 3.8B active parameters (MoE), supports a 256K context window, and achieves over 1,100 tokens per second on NVIDIA H100 GPUs. NVIDIA has quantized the model to NVFP4 precision using its Model Optimizer, making it available on Hugging Face for commercial and non-commercial use. The model also features configurable thinking mode, native function calling, and multilingual support across 35+ languages.
中文摘要
Google DeepMind的DiffusionGemma 26B A4B IT是一个开放权重的多模态模型,使用离散扩散从文本、图像和视频输入生成文本。该模型总参数25.2B,激活参数3.8B(MoE),支持256K上下文窗口,在NVIDIA H100 GPU上生成速度超过1100 tokens/秒。NVIDIA通过Model Optimizer将其量化为NVFP4精度,并发布在Hugging Face上,可用于商业和非商业用途。该模型还具备可配置的思考模式、原生函数调用和35+语言的多语言推理能力。
关键要点
DiffusionGemma 26B A4B IT is a Google DeepMind multimodal model using discrete diffusion for text output from text, image, and video inputs.
DiffusionGemma 26B A4B IT是Google DeepMind的多模态模型,采用离散扩散从文本、图像和视频输入生成文本。
NVIDIA released an NVFP4-quantized version on Hugging Face, optimized with Model Optimizer and ready for commercial use.
NVIDIA在Hugging Face上发布了NVFP4量化版本,由Model Optimizer优化,可用于商业用途。
The model reaches over 1,100 tokens per second on H100 (FP8), supports a 256K context window, and handles 35+ languages.
该模型在H100(FP8)上速度超过1100 tokens/秒,支持256K上下文窗口,兼容35+种语言。
It employs a 3.8B active parameter MoE architecture and includes function calling, thinking mode, and a 256-token parallel generation design.
采用3.8B激活参数的MoE架构,包含函数调用、思考模式和256 token并行生成设计。
Open-weights release allows both commercial and non-commercial usage.
权重开放,允许商业和非商业使用。