Microsoft has released HARC-Qwen2.5-7B-Instruct, a fine-tuned version of Qwen2.5-7B-Instruct optimized for safety and alignment in conversational AI. The model is a transformer-based text-generation model, available on Hugging Face under the Apache 2.0 license. It is distributed in safetensors format and is compatible with text-generation-inference and Hugging Face endpoints. The release is associated with the paper arXiv:2607.00572.
Microsoft released HARC-Llama-3.1-8B-Instruct on Hugging Face. It is a text-generation model built on Meta's Llama 3.1 8B Instruct. Repository tags indicate a focus on safety, alignment, and conversational use. The model card provides no benchmarks, training details, or specific capability claims. It is distributed under the Llama 3.1 license.
AGVBench is a comprehensive benchmark evaluating 30 data augmentation strategies on five public palm- and finger-vein datasets with seven backbone architectures, including CNNs, vision transformers, and vein-specific models. Multi-image mixing methods such as MixUp, PuzzleMix, and StarMixup achieve the highest recognition accuracy but exhibit poor calibration and high vulnerability to adversarial perturbations. Severe geometric transformations often degrade performance, likely due to feature misalignment or spatial cropping. The results demonstrate that accuracy-centric evaluation is insufficient for biometric data augmentation, emphasizing the need for security and robustness. AGVBench provides standardized protocols and open-source code to advance reproducible and secure vein recognition research.
This paper reveals that dense on-policy self-distillation (SDPO) accelerates in-domain specialization under stable teacher signals, but causes severe forgetting and even complete collapse during continual post-training. In contrast, on-policy reinforcement learning methods like GRPO adapt more conservatively and better preserve prior capabilities. Denser self-distillation induces larger drift in parameter and response spaces, and amplifies high-frequency formatting artifacts through a self-reinforcing teacher-student loop. The findings caution that on-policy data alone is insufficient for continual learning, and dense self-distillation should not be treated as a default stabilizer.
PACE constructs proxy benchmarks from a small, automatically selected subset of non-agentic evaluation instances to predict model scores on expensive agentic benchmarks. By combining target-relevance and globally informative selection strategies, PACE-Bench is formed from 19 non-agentic benchmarks. Evaluated across 14 models and 4 agentic benchmarks (including SWE-Bench and GAIA), it achieves leave-one-out cross-validation mean absolute error under 4%, Spearman correlation above 0.80, and pairwise model-ranking accuracy around 85%, all at less than 1% of the full agentic evaluation cost. The selected instances also reveal the distinct skill demands of each agentic benchmark. PACE enables practical performance estimation for model development, selection, and routing without full agent evaluation overhead.
SkillCoach proposes a self-evolving rubric framework that derives skill-grounded process rubrics from rollouts to evaluate agentic skill-use along four dimensions: skill selection, skill following, skill composition, and skill-grounded reflection. It separates process quality from task success by keeping an external verifier as a distinct outcome signal, enabling detection of hidden failures that final accuracy alone would miss. The evolved rubrics are then used as process supervision to select high-quality training trajectories, outperforming outcome-only filtering. Experiments show the approach improves evaluation quality and provides stronger supervision signals for enhancing agentic skill-use.