Operads for compositional reasoning in LLMs

Loading / 加载中

English summary

This paper introduces operads—mathematical structures modeling many-in, one-out operations—as a rigorous framework for question decomposition in LLMs. The authors define the questions operad Q in which operations are question templates and composition is substitution of sub-answers, and they interpret QA models as algebras over Q. A key contribution is operadic consistency, a metric that measures how well a model's answers agree across different partial collapses of a decomposition tree. Companion empirical work finds that operadic consistency strongly correlates with accuracy across twelve LLMs and four multi-hop QA datasets, outperforming temperature-based self-consistency baselines. The operadic perspective opens new analytical and improvement directions for multi-step reasoning.

Chinese summary

本文提出将算子代数（operads）作为LLM中问题分解的严谨数学框架。算子代数描述多入单出操作及其组合；作者定义了问题算子代数Q，其中操作为问题模板，组合为子答案替换，并将问答模型解释为Q上的代数。核心贡献是算子一致性指标，它衡量模型在分解树不同局部合并下的答案一致性。伴随实证研究表明，在12个LLM和4个多跳问答数据集上，算子一致性与准确率高度相关，且优于基于温度的自一致性基线。该算子代数视角为多步推理的分析和改进开辟了新方向。

Key points

Operads are proposed as the natural mathematical framework for question decomposition, with question templates as operations and sub-answer substitution as composition.

算子代数被提出作为问题分解的天然数学框架，其中问题模板为操作，子答案替换为组合。

QA models are formalized as algebras over the questions operad Q, providing a principled structure for the decomposition–answering pipeline.

将问答模型形式化为问题算子代数Q上的代数，为分解–回答流程提供了原则性结构。

Operadic consistency, a new metric, quantifies answer agreement across decomposition variants; a companion paper shows it correlates strongly with accuracy and outperforms standard self-consistency.

新的算子一致性指标量化了不同分解方式下的答案一致性；伴随论文表明其与准确率高度相关，且优于标准的自一致性方法。