Microsoft has released HARC-Qwen2.5-7B-Instruct, a fine-tuned version of Qwen2.5-7B-Instruct optimized for safety and alignment in conversational AI. The model is a transformer-based text-generation model, available on Hugging Face under the Apache 2.0 license. It is distributed in safetensors format and is compatible with text-generation-inference and Hugging Face endpoints. The release is associated with the paper arXiv:2607.00572.
Microsoft released HARC-Llama-3.1-8B-Instruct on Hugging Face. It is a text-generation model built on Meta's Llama 3.1 8B Instruct. Repository tags indicate a focus on safety, alignment, and conversational use. The model card provides no benchmarks, training details, or specific capability claims. It is distributed under the Llama 3.1 license.
AGVBench is a comprehensive benchmark evaluating 30 data augmentation strategies on five public palm- and finger-vein datasets with seven backbone architectures, including CNNs, vision transformers, and vein-specific models. Multi-image mixing methods such as MixUp, PuzzleMix, and StarMixup achieve the highest recognition accuracy but exhibit poor calibration and high vulnerability to adversarial perturbations. Severe geometric transformations often degrade performance, likely due to feature misalignment or spatial cropping. The results demonstrate that accuracy-centric evaluation is insufficient for biometric data augmentation, emphasizing the need for security and robustness. AGVBench provides standardized protocols and open-source code to advance reproducible and secure vein recognition research.
The paper adapts a mixture-of-experts discrete diffusion language model, DiffusionGemma-26B, and benchmarks it against the autoregressive Gemma-4-26B on medical visual question answering. Using the same LoRA fine-tuning recipe, the diffusion model matches or exceeds AR performance, scored by a verbosity-robust LLM judge, while decoding 3.5–4.4× faster. The fine-tuned model (3.8B active parameters) is competitive with frontier vision-language models. Crucially, the diffusion paradigm enables any-order infill: a radiologist can correct parts of a report and the model generates the text between them, a capability inherent to diffusion that autoregressive models cannot easily replicate. This suits real-world radiology reports, which often vary in style and completeness across clinicians and institutions.
Nvidia has published a quantized variant of the Mistral-Medium-3.5-128B large language model on Hugging Face. The model employs NVFP4, a 4-bit floating point precision format, to reduce memory footprint and potentially accelerate inference. It is labeled as conversational and text-generation compatible, using the safetensors format. The repository indicates the model is based on the original Mistral-Medium-3.5-128B from Mistral AI and is shared under a custom license.
Microsoft has released GELab-Zero-4B-preview-Sico-Evolution, a 4-billion-parameter vision-language model specialized for GUI agent tasks. The model is built on Qwen3-VL using LoRA fine-tuning and targets mobile and general GUI agent use cases. It supports English and Chinese text inputs, and processes image-text-to-text pipelines. The release is open-source under the Apache 2.0 license and is noted as an early preview version.