Operads for compositional reasoning in LLMs
English summary
This paper introduces operads—mathematical structures modeling many-in, one-out operations—as a rigorous framework for question decomposition in LLMs. The authors define the questions operad Q in which operations are question templates and composition is substitution of sub-answers, and they interpret QA models as algebras over Q. A key contribution is operadic consistency, a metric that measures how well a model's answers agree across different partial collapses of a decomposition tree. Companion empirical work finds that operadic consistency strongly correlates with accuracy across twelve LLMs and four multi-hop QA datasets, outperforming temperature-based self-consistency baselines. The operadic perspective opens new analytical and improvement directions for multi-step reasoning.
Chinese summary
本文提出将算子代数(operads)作为LLM中问题分解的严谨数学框架。算子代数描述多入单出操作及其组合;作者定义了问题算子代数Q,其中操作为问题模板,组合为子答案替换,并将问答模型解释为Q上的代数。核心贡献是算子一致性指标,它衡量模型在分解树不同局部合并下的答案一致性。伴随实证研究表明,在12个LLM和4个多跳问答数据集上,算子一致性与准确率高度相关,且优于基于温度的自一致性基线。该算子代数视角为多步推理的分析和改进开辟了新方向。
Key points
Operads are proposed as the natural mathematical framework for question decomposition, with question templates as operations and sub-answer substitution as composition.
算子代数被提出作为问题分解的天然数学框架,其中问题模板为操作,子答案替换为组合。
QA models are formalized as algebras over the questions operad Q, providing a principled structure for the decomposition–answering pipeline.
将问答模型形式化为问题算子代数Q上的代数,为分解–回答流程提供了原则性结构。
Operadic consistency, a new metric, quantifies answer agreement across decomposition variants; a companion paper shows it correlates strongly with accuracy and outperforms standard self-consistency.
新的算子一致性指标量化了不同分解方式下的答案一致性;伴随论文表明其与准确率高度相关,且优于标准的自一致性方法。