The paper identifies three fundamental flaws in constrained reinforcement learning: unknown stochastic consequence delay yields provably incorrect TD targets, the agent's causal effect is conflated with consequences already in the pipeline causing systematic over/under-penalization, and embedding the multiplier into a single Q-function renders Bellman targets non-stationary under multiplier updates. CCPL introduces a delay-corrected Bellman operator that learns the full delay distribution and computes an adaptive effective discount, with a novel contraction proof. It proves state-conditioned λ(s) strictly dominates any scalar λ, closing a prior theoretical gap, and replaces the cost estimate with the marginal causal contribution learned via an Interventional Consequence Net pretrained on environment structural causal model labels. CCPL maintains separate reward and constraint Q-functions, keeping targets stationary and combining them only at inference. Empirically, CCPL is the only agent among eight baselines to achieve both high reward (+4.84) and full constraint satisfaction (100%) across six environments including adversarial settings, and its core theorems are machine-verified at every training run.
The paper proposes a unified Dirichlet framework for spatial-temporal risk assessment, proving that a single Dirichlet posterior per cell with an additive evidence-update rule is the unique update–predictor pair satisfying four axioms and is limit-equivalent to seven classical methods (AHP, Dempster–Shafer, Hawkes, kernel density estimation, etc.). The framework simultaneously yields a severity score and threat characterization from the posterior. On a large-scale benchmark of 41 regions × 10,000 cells × 365 days, it achieves an one-vs-rest AUROC of 0.666 and severity AUROC of 0.725, statistically significantly outperforming 15 structured baselines (Holm-corrected p < 10⁻²⁶), while delivering threat characterization accuracy of 79.1%—compared to only 0–26% for competitors with comparable AUROC. Real-world transfer to 1.69M London and 119K Chicago crime events preserves the dual-output advantage, and a pre-registered specialization experiment confirms the operational configuration beats the matched specialist. The method requires 3.6× less memory than seven independent models (128 vs. 464 bytes/cell) at 41K signals/sec throughput.
This paper identifies that object hallucinations in large vision-language models (LVLMs) originate from visual encoders, uncovering three core issues: statistical bias, inherent bias, and vulnerability. To address these, SHIELD is introduced as a training-free framework that applies three strategies: re-weighting visual tokens to reduce statistical bias, injecting noise-derived tokens to counteract inherent bias, and employing adversarial attacks with contrastive decoding to mitigate vulnerability. Experiments across multiple benchmarks and LVLM families demonstrate SHIELD effectively reduces object hallucinations while maintaining strong general performance, and the code is publicly available.
AI skin cancer triage systems generate about 106 excess unnecessary referrals per 1,000 darker-skin patients due to over-prediction, not missed cancers. This over-prediction stems from melanin concentration reducing lesion-background optical contrast, causing class overlap. The authors formalize this as a signal-to-noise ratio (SNR) framework, predicting a 5.2× SNR reduction from lighter to darker skin tones. Experiments on the HAM10000 dataset with a high-confidence ITA subset show dark skin achieves slightly higher sensitivity (0.848 vs. 0.821) but substantially lower specificity (0.720 vs. 0.831, Δ=−11.1pp). An ablation study compares ITA-based tone conditioning (feature calibration) and dark-skin augmentation (decision boundary adjustment), revealing their distinct effects. Zero-shot transfer to the DDI dataset (n=656) confirms the AUC gap. Code and trained weights are publicly released.
This paper addresses robustness in cooperative multi-agent reinforcement learning (c-MARL) against deployment-time adversaries with unknown objectives. The authors propose a Bayesian Dec-POMDP game model with a continuum of adversarial types, each corresponding to a distinct attack objective. To make the problem tractable, they introduce a partitioning scheme that groups adversarial policies based on their performance against a reference c-MARL policy, reducing it to a finite-type Bayesian game. They develop a provably convergent externally constrained reinforcement learning algorithm to compute adversarial policies and use a simultaneous gradient update scheme to obtain robust Bayesian c-MARL policies. The resulting approach, BATPAL, is shown in experiments to outperform state-of-the-art baselines across diverse benchmarks and attack strategies.