论文来源: HUGGINGFACE2026年7月2日重要度: 4/5

SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use

中文标题: SkillCoach：自演化评分体系用于评估和增强智能体技能使用

英文摘要

SkillCoach proposes a self-evolving rubric framework that derives skill-grounded process rubrics from rollouts to evaluate agentic skill-use along four dimensions: skill selection, skill following, skill composition, and skill-grounded reflection. It separates process quality from task success by keeping an external verifier as a distinct outcome signal, enabling detection of hidden failures that final accuracy alone would miss. The evolved rubrics are then used as process supervision to select high-quality training trajectories, outperforming outcome-only filtering. Experiments show the approach improves evaluation quality and provides stronger supervision signals for enhancing agentic skill-use.

中文摘要

SkillCoach 提出一个自演化的评分框架，从运行足迹中自动归纳基于技能的过程评分标准，从技能选择、技能遵循、技能组合和基于技能的反思四个维度评估智能体的技能使用。该框架将外部验证器保留为独立的最终成功信号，从而区分过程质量与偶然成功，揭示仅靠最终准确率无法发现的失败。演化后的评分标准进一步作为过程监督，用于筛选高质量训练轨迹，优于仅依赖最终结果的过滤方式。实验表明该方法提升了评估质量，并为增强智能体的技能使用提供了更强的监督信号。

关键要点

Introduces self-evolving rubrics from rollouts to evaluate skill-use in four dimensions: selection, following, composition, and reflection.
从运行记录中自演化出评分标准，从选择、遵循、组合与反思四个维度评估智能体的技能使用。
Separates process evaluation from outcome verification, exposing hidden failures that final accuracy overlooks.
将过程评估与结果验证解耦，揭示最终准确率所掩盖的隐藏失败。
Uses evolved rubrics as process supervision to select training trajectories, outperforming outcome-only filtering.
以演化后的评分标准作为过程监督，筛选训练轨迹，其效果优于仅依赖最终结果过滤的方式。
Demonstrates improved evaluation quality and stronger supervision signals for enhancing agentic skill-use.
实验证明该方法提升了评估质量，为增强智能体技能使用提供了更强的监督信号。

打开原文