AI intelligence feed

HUGGINGFACEJul 2, 2026

Microsoft Releases HARC-Qwen2.5-7B-Instruct: A Safety-Aligned Chat Model

Microsoft has released HARC-Qwen2.5-7B-Instruct, a fine-tuned version of Qwen2.5-7B-Instruct optimized for safety and alignment in conversational AI. The model is a transformer-based text-generation model, available on Hugging Face under the Apache 2.0 license. It is distributed in safetensors format and is compatible with text-generation-inference and Hugging Face endpoints. The release is associated with the paper arXiv:2607.00572.

HUGGINGFACEJul 2, 2026

Microsoft/HARC-Llama-3.1-8B-Instruct: Safety-Aligned Llama 3.1 Model

Microsoft released HARC-Llama-3.1-8B-Instruct on Hugging Face. It is a text-generation model built on Meta's Llama 3.1 8B Instruct. Repository tags indicate a focus on safety, alignment, and conversational use. The model card provides no benchmarks, training details, or specific capability claims. It is distributed under the Llama 3.1 license.

HUGGINGFACEJul 2, 2026Highlight

AGVBench: A Reliability-Oriented Benchmark of Data Augmentation for Vein Recognition

AGVBench is a comprehensive benchmark evaluating 30 data augmentation strategies on five public palm- and finger-vein datasets with seven backbone architectures, including CNNs, vision transformers, and vein-specific models. Multi-image mixing methods such as MixUp, PuzzleMix, and StarMixup achieve the highest recognition accuracy but exhibit poor calibration and high vulnerability to adversarial perturbations. Severe geometric transformations often degrade performance, likely due to feature misalignment or spatial cropping. The results demonstrate that accuracy-centric evaluation is insufficient for biometric data augmentation, emphasizing the need for security and robustness. AGVBench provides standardized protocols and open-source code to advance reproducible and secure vein recognition research.

HUGGINGFACEJul 1, 2026Highlight

Cross-Domain Generalization Failure in Lightweight Intrusion Detection Models for IIoT Networks

This study trains four lightweight architectures on one IIoT intrusion detection dataset and evaluates them without retraining on two structurally distinct datasets, using a shared feature set. Both top-performing models rely overwhelmingly on coarse port-category features; the most influential category appears 96 to 435 times more in source-domain attack traffic than in target domains, showing that coarsening port resolution relocates rather than removes a shortcut. Evaluation under natural class imbalance can reverse which target network seems harder to generalize to. Adversarial robustness is uncorrelated with cross-network generalization, and recovery through limited target-domain exposure varies widely by architecture. The results argue that deployment readiness should be judged by cross-network evaluation under realistic distributions, not within-domain accuracy alone.

HUGGINGFACEJul 1, 2026

MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

The paper introduces MemSyco-Bench, a benchmark designed to evaluate memory-induced sycophancy in LLM-based agents. It addresses how retrieved memories can cause agents to over-align with users, sacrificing factual accuracy. The benchmark includes five tasks that test an agent's ability to reject memory as factual evidence, respect memory scope, resolve memory-evidence conflicts, track memory updates, and use valid memory for personalization. All resources are publicly available on GitHub.

HUGGINGFACEJun 29, 2026Highlight

SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

SafePyramid is a new benchmark for evaluating in-context policy guardrailing, comprising 1,000 multi-turn conversations, 3,000 application-specific policies, and 61,699 distinct natural-language rules across 10 domains. The benchmark structures evaluation into three difficulty levels: L0 (individual rule understanding), L1 (reasoning over rule dependencies), and L2 (adapting full novel policy frameworks). Evaluation of 10 frontier LLMs and 5 policy-configurable guardrails reveals that even GPT-5.5 correctly identifies all violated rules in only 54.0% of L0 cases, 35.3% of L1 cases, and 12.9% of L2 cases. These results underscore the significant challenges remaining in in-context policy guardrailing, particularly in resolving rule dependencies and adapting to new policies.