Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]
English summary
A developer shares production experience building an agent with 140 MCP tools, finding that semantic embeddings for tool selection gave only 64% top-1 accuracy and were confidently wrong. BM25 over tool metadata achieved 81% accuracy, outperforming a hybrid approach that scored 78%. The key insight is that tool descriptions are short and keyword-dependent, making BM25 more effective than embeddings. Indexing schema fields like property names further improved performance. The author recommends testing specific corpora rather than assuming document-RAG defaults transfer to tool selection.
Chinese summary
一位开发者分享了构建包含140个MCP工具的智能体的生产经验,发现使用语义嵌入进行工具选择仅达到64%的top-1准确率,且错误时非常自信。对工具元数据使用BM25达到了81%的准确率,优于混合方法的78%。关键洞见是工具描述简短且依赖关键词,使得BM25比嵌入更有效。索引模式字段如属性名进一步提升了性能。作者建议针对特定语料库进行测试,而不是假设文档RAG的默认设置适用于工具选择。
Key points
Semantic embeddings achieved only 64% top-1 accuracy for tool selection in production.
语义嵌入在生产环境中仅达到64%的top-1准确率。
BM25 over tool metadata (name, description, schema walk) achieved 81% top-1 accuracy.
BM25对工具元数据(名称、描述、模式遍历)达到了81%的top-1准确率。
Hybrid approach (0.7 semantic + 0.3 BM25) scored 78%, worse than BM25 alone.
混合方法(0.7语义+0.3 BM25)得分78%,比纯BM25差。
Tool descriptions are short and keyword-discriminative; BM25 is better suited than embeddings.
工具描述简短且具有关键词区分性,BM25比嵌入更适合。
Indexing schema property names (e.g., repo_id) is crucial for discriminating between similar tools.
索引模式属性名(如repo_id)对于区分类似工具至关重要。