Loading / 加载中

Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D] | thinkgap

SocialSource: REDDIT MACHINELEARNINGJune 8, 2026Importance: 4/5

Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]

English summary

A developer shares production experience building an agent with 140 MCP tools, finding that semantic embeddings for tool selection gave only 64% top-1 accuracy and were confidently wrong. BM25 over tool metadata achieved 81% accuracy, outperforming a hybrid approach that scored 78%. The key insight is that tool descriptions are short and keyword-dependent, making BM25 more effective than embeddings. Indexing schema fields like property names further improved performance. The author recommends testing specific corpora rather than assuming document-RAG defaults transfer to tool selection.

Chinese summary

一位开发者分享了构建包含140个MCP工具的智能体的生产经验，发现使用语义嵌入进行工具选择仅达到64%的top-1准确率，且错误时非常自信。对工具元数据使用BM25达到了81%的准确率，优于混合方法的78%。关键洞见是工具描述简短且依赖关键词，使得BM25比嵌入更有效。索引模式字段如属性名进一步提升了性能。作者建议针对特定语料库进行测试，而不是假设文档RAG的默认设置适用于工具选择。

Key points

Semantic embeddings achieved only 64% top-1 accuracy for tool selection in production.
语义嵌入在生产环境中仅达到64%的top-1准确率。
BM25 over tool metadata (name, description, schema walk) achieved 81% top-1 accuracy.
BM25对工具元数据（名称、描述、模式遍历）达到了81%的top-1准确率。
Hybrid approach (0.7 semantic + 0.3 BM25) scored 78%, worse than BM25 alone.
混合方法（0.7语义+0.3 BM25）得分78%，比纯BM25差。
Tool descriptions are short and keyword-discriminative; BM25 is better suited than embeddings.
工具描述简短且具有关键词区分性，BM25比嵌入更适合。
Indexing schema property names (e.g., repo_id) is crucial for discriminating between similar tools.
索引模式属性名（如repo_id）对于区分类似工具至关重要。

Open original