This paper challenges the prevailing assumption that differential privacy (DP) inherently improves backdoor robustness in federated learning (FL). It reveals a masking effect where DP undermines detection of malicious updates by hiding their statistical signatures. The authors propose RING, a novel attack that deliberately exploits DP as a cloak; compromised clients collaboratively craft adversarial perturbations to reconstruct a strong backdoor signal during aggregation without triggering anomaly detection. RING is agnostic to the underlying backdoor technique and can compose with existing attacks, amplifying its threat. Experiments across four image and text datasets under non-iid settings show RING achieves an average attack success rate of 90.3% against six state-of-the-art defenses under moderate privacy budgets, improving up to 26.08× over baselines. Potential countermeasures incur significant utility trade-offs, exposing a fundamental security gap in DP-FL deployments.
ClinHallu is a benchmark designed for stage-wise diagnosis of hallucinations in medical multimodal large language model (MLLM) reasoning. It contains 7,031 validated instances, each augmented with a structured reasoning trace that decomposes the process into visual recognition, knowledge recall, and reasoning integration. Stage-replacement interventions are used to measure how correcting a specific reasoning stage affects the final answer. The paper also shows that trace-supervised fine-tuning can reduce stage-wise hallucinations. The benchmark is publicly available on GitHub.
The paper investigates acoustic adversarial attacks on AI-based computer vision systems using audible frequencies (<20 kHz). Unlike prior ultrasonic attacks limited to short range, this work demonstrates that lower-frequency sound can resonate commercially available cameras to induce physical motion and introduce artifacts. Physical experiments on an off-the-shelf object detection model (YOLO11) caused misclassifications, missed detections, and object hallucinations. The study analyzes how various image and object features influence attack effectiveness and provides insights into vulnerability factors to inform future mitigation strategies.
A new method is proposed to measure the degree of templated versus holistic cultural localization in AI-generated stories by identifying lexical tokens that distinguish narratives across nationalities and then measuring narrative similarity after their removal. Evaluating stories from five models across 125 topics and 193 nationalities, the method finds that only 9–17% of the vocabulary accounts for cross-national variation, with the remaining text exhibiting repeated multi-word sequences, indicating a shared culturally-agnostic template. The study further characterizes the identified cultural markers for stereotypicality and offensiveness, revealing that markers from 19 countries, predominantly in the Global South, are on average offensive.
Ion Matei et al. present a framework for aerial wildfire suppression planning that integrates a hybrid neural-cellular automaton fire spread model with gradient-based optimization. The model predicts spatially varying fire behavior from terrain, fuel, and wind inputs, while the intervention module decides binary drop actions with continuous location and orientation parameters. Water and retardant are represented distinctly, reducing active burning immediately or persistently lowering future spread. Aleatoric uncertainty is captured via Monte Carlo sampling of daily fire states, and epistemic uncertainty via spatially correlated prediction-error perturbations. A case study on the 2020 Bear Fire demonstrates the framework's ability to generate coherent suppression schedules and support uncertainty-aware strategy analysis.
The paper reframes shield synthesis in reinforcement learning from a runtime safety mechanism into a design-time analytical tool for assessing network defensibility. It instantiates this via a constrained two-player safety game for network defense, which yields a binary defensibility verdict, the winning region, a shield, and topology-level metrics derived from attractor computation. These formal measures are combined with post-convergence behavior from adversarial multi-agent reinforcement learning to form a defensibility fingerprint. A what-if analysis demonstrates that formal defensibility and operational effectiveness capture distinct aspects of security, with small architectural changes causing large shifts in operational outcomes while leaving formal safety margins nearly unchanged. The work concludes that shield synthesis is most valuable as a framework for answering architectural questions about whether, where, and how a system can be defended.