Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification
English summary
This paper proposes an agentic large language model (LLM) framework for Canadian 10-digit Harmonized Tariff Schedule (HTS) code classification in maritime logistics. The framework combines multi-agent retrieval over official tariff documents, evidence-grounded reasoning, consensus-based validation with element-wise voting across hierarchical code components, confidence estimation, and human-in-the-loop escalation. Evaluation on a private dataset of 3,300 expert-labeled product records reveals that exact 10-digit classification remains difficult, with accuracy sharply declining from coarse chapter level to fine-grained tariff and statistical suffix levels. The results underscore the necessity of interpretable, uncertainty-aware, and human-centered classification workflows over fully autonomous single-step prediction. The code is publicly available.
Chinese summary
本文提出一种用于海事物流场景下加拿大十位协调制度编码分类的代理型大语言模型框架。该框架融合了基于官方税则文件的多智能体检索、证据导向推理、跨层级分元素投票的共识验证、置信度估计以及人机协同升级机制。在包含3300条专家标注的产品记录私有数据集上的评估表明,精确十位编码分类仍然困难,准确率从粗粒度的章级分类到精细的子目和统计后缀级别急剧下降。研究结果强调,相较于全自动单步预测,需要可解释、不确定性感知且以人为中心的分类流程。代码已公开。
Key points
Proposes a consensus-based multi-agent LLM framework for HTS code classification, integrating retrieval, reasoning, voting, and human-in-the-loop escalation.
提出基于共识的多智能体LLM框架,集成检索、推理、投票和人机协同升级机制,用于HTS编码分类。
Evaluated on 3,300 expert-labeled Canadian 10-digit HTS product records; exact classification remains challenging, with large performance drops from chapter to fine-grained levels.
在3300条专家标注的加拿大十位HTS产品记录上评估,精确分类仍然困难,从章级到精细层级性能大幅下降。
Demonstrates that element-wise voting and confidence estimation improve interpretability and support human oversight, rather than fully autonomous prediction.
证明分元素投票和置信度估计提升可解释性,支持人工监督,而非完全自主预测。
Code publicly released to facilitate further research in interpretable, compliance-oriented classification for smart ports.
代码已公开,以促进面向智能港口的可解释、合规导向分类研究。