This paper identifies that object hallucinations in large vision-language models (LVLMs) originate from visual encoders, uncovering three core issues: statistical bias, inherent bias, and vulnerability. To address these, SHIELD is introduced as a training-free framework that applies three strategies: re-weighting visual tokens to reduce statistical bias, injecting noise-derived tokens to counteract inherent bias, and employing adversarial attacks with contrastive decoding to mitigate vulnerability. Experiments across multiple benchmarks and LVLM families demonstrate SHIELD effectively reduces object hallucinations while maintaining strong general performance, and the code is publicly available.
AI skin cancer triage systems generate about 106 excess unnecessary referrals per 1,000 darker-skin patients due to over-prediction, not missed cancers. This over-prediction stems from melanin concentration reducing lesion-background optical contrast, causing class overlap. The authors formalize this as a signal-to-noise ratio (SNR) framework, predicting a 5.2× SNR reduction from lighter to darker skin tones. Experiments on the HAM10000 dataset with a high-confidence ITA subset show dark skin achieves slightly higher sensitivity (0.848 vs. 0.821) but substantially lower specificity (0.720 vs. 0.831, Δ=−11.1pp). An ablation study compares ITA-based tone conditioning (feature calibration) and dark-skin augmentation (decision boundary adjustment), revealing their distinct effects. Zero-shot transfer to the DDI dataset (n=656) confirms the AUC gap. Code and trained weights are publicly released.