SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use
中文标题: SkillCoach:自演化评分体系用于评估和增强智能体技能使用
英文摘要
SkillCoach proposes a self-evolving rubric framework that derives skill-grounded process rubrics from rollouts to evaluate agentic skill-use along four dimensions: skill selection, skill following, skill composition, and skill-grounded reflection. It separates process quality from task success by keeping an external verifier as a distinct outcome signal, enabling detection of hidden failures that final accuracy alone would miss. The evolved rubrics are then used as process supervision to select high-quality training trajectories, outperforming outcome-only filtering. Experiments show the approach improves evaluation quality and provides stronger supervision signals for enhancing agentic skill-use.
中文摘要
SkillCoach 提出一个自演化的评分框架,从运行足迹中自动归纳基于技能的过程评分标准,从技能选择、技能遵循、技能组合和基于技能的反思四个维度评估智能体的技能使用。该框架将外部验证器保留为独立的最终成功信号,从而区分过程质量与偶然成功,揭示仅靠最终准确率无法发现的失败。演化后的评分标准进一步作为过程监督,用于筛选高质量训练轨迹,优于仅依赖最终结果的过滤方式。实验表明该方法提升了评估质量,并为增强智能体的技能使用提供了更强的监督信号。
关键要点
Introduces self-evolving rubrics from rollouts to evaluate skill-use in four dimensions: selection, following, composition, and reflection.
从运行记录中自演化出评分标准,从选择、遵循、组合与反思四个维度评估智能体的技能使用。
Separates process evaluation from outcome verification, exposing hidden failures that final accuracy overlooks.
将过程评估与结果验证解耦,揭示最终准确率所掩盖的隐藏失败。
Uses evolved rubrics as process supervision to select training trajectories, outperforming outcome-only filtering.
以演化后的评分标准作为过程监督,筛选训练轨迹,其效果优于仅依赖最终结果过滤的方式。
Demonstrates improved evaluation quality and stronger supervision signals for enhancing agentic skill-use.
实验证明该方法提升了评估质量,为增强智能体技能使用提供了更强的监督信号。