Contrast-Induced Class Overlap as a Fairness Bottleneck in Dermatological AI: Evidence from HAM10000
中文标题: 对比诱导的类别重叠:皮肤病学AI公平性的瓶颈——基于HAM10000的证据
英文摘要
AI skin cancer triage systems generate about 106 excess unnecessary referrals per 1,000 darker-skin patients due to over-prediction, not missed cancers. This over-prediction stems from melanin concentration reducing lesion-background optical contrast, causing class overlap. The authors formalize this as a signal-to-noise ratio (SNR) framework, predicting a 5.2× SNR reduction from lighter to darker skin tones. Experiments on the HAM10000 dataset with a high-confidence ITA subset show dark skin achieves slightly higher sensitivity (0.848 vs. 0.821) but substantially lower specificity (0.720 vs. 0.831, Δ=−11.1pp). An ablation study compares ITA-based tone conditioning (feature calibration) and dark-skin augmentation (decision boundary adjustment), revealing their distinct effects. Zero-shot transfer to the DDI dataset (n=656) confirms the AUC gap. Code and trained weights are publicly released.
中文摘要
AI皮肤癌分诊系统在深肤色患者中每千人约产生106次不必要的额外转诊,原因并非漏诊,而是过度预测。其根源在于黑色素浓度降低皮损与背景的光学对比度,导致类别重叠。作者用信噪比(SNR)框架形式化该机制,预测从浅肤色到深肤色SNR下降5.2倍。在HAM10000数据集高置信度ITA子集上,深肤色灵敏度略高(0.848 vs. 0.821),但特异性显著更低(0.720 vs. 0.831,降幅11.1个百分点)。消融研究对比了ITA色调调节(特征校准)与深肤色增强(决策边界调整),揭示两者独立效果。零样本迁移至DDI数据集(n=656)确认了AUC差距。代码和训练权重已公开发布。
关键要点
AI triage systems generate ~106 excess unnecessary referrals per 1,000 darker-skin patients due to over-prediction, not under-detection.
AI分诊系统每千名深肤色患者约产生106次不必要的额外转诊,原因是过度预测而非漏诊。
Melanin concentration systematically reduces lesion-background optical contrast, leading to class overlap and a predicted 5.2× SNR reduction from light to dark skin.
黑色素浓度系统性降低皮损-背景光学对比度,导致类别重叠,从浅肤色到深肤色信噪比预计降低5.2倍。
On a high-confidence ITA subset of HAM10000, dark skin specificity is 11.1pp lower than light skin (0.720 vs 0.831), while sensitivity is similar.
在HAM10000高置信度ITA子集上,深肤色特异性比浅肤色低11.1个百分点(0.720 vs 0.831),灵敏度相近。
Ablation study contrasts ITA-based tone conditioning (feature calibration) and dark-skin augmentation (decision boundary placement), showing independent effects.
消融研究对比了基于ITA的色调调节(特征校准)和深肤色增强(决策边界调整),显示了独立效果。
Zero-shot transfer to DDI dataset confirms the AUC gap and score suppression, and all code and weights are publicly released.
零样本迁移至DDI数据集确认了AUC差距和分数抑制,所有代码和权重已公开发布。