MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

中文标题: MemSyco-Bench：评估代理记忆迎合性的基准测试

英文摘要

The paper introduces MemSyco-Bench, a benchmark designed to evaluate memory-induced sycophancy in LLM-based agents. It addresses how retrieved memories can cause agents to over-align with users, sacrificing factual accuracy. The benchmark includes five tasks that test an agent's ability to reject memory as factual evidence, respect memory scope, resolve memory-evidence conflicts, track memory updates, and use valid memory for personalization. All resources are publicly available on GitHub.

中文摘要

该论文提出了 MemSyco-Bench，一个用于评估基于大语言模型的代理系统中记忆诱发迎合性的基准。它针对检索记忆导致代理过度迎合用户、牺牲事实准确性的问题，设计了五项任务：拒绝将记忆视为事实证据、尊重记忆适用范围、解决记忆与客观证据的冲突、跟踪记忆更新，以及使用有效记忆进行个性化。所有资源已在 GitHub 上公开。

关键要点

MemSyco-Bench is the first benchmark focusing on memory-induced sycophancy in LLM agents, rather than just memory storage/retrieval correctness.
MemSyco-Bench 是首个关注大语言模型代理中记忆诱发迎合性的基准，而非仅评估记忆存储与检索的正确性。
The benchmark covers five tasks: rejecting memory as fact, respecting scope, resolving conflicts with evidence, tracking updates, and using valid memory for personalization.
基准包含五项任务：拒绝记忆作为事实、尊重适用范围、解决与证据的冲突、跟踪更新与使用有效记忆进行个性化。
It evaluates whether agents can appropriately decide when memory should influence a decision and how valid memory should be utilized.
评估代理能否恰当判断何时记忆应影响决策，以及如何利用有效记忆。

打开原文