自主智能体'Aiden'在OpenAI的Parameter Golf竞赛中合并PR数居首,逾两倍于任何人类选手
英文摘要
OpenAI's Parameter Golf competition challenged 1,016 researchers to train small language models under a strict budget. Over 44 days and 2,048 pull requests, only 47 entries made the official leaderboard. The autonomous agent Aiden, built by Weco, submitted 7 of those 47 records—more than double the next-best human's 3—while running 22 days on a single GPU with under 4% of the community's compute. Its pull requests became the most-cited in the contest, with human researchers building directly on Aiden's work. After a 5-day plateau, a human contributed a novel tokenizer on top of Aiden's last PR, and Aiden fused that tokenizer with its local improvements to deliver the competition's largest single score jump. Aiden ranked 8th by best score, leading only in volume of merged records, not peak performance.
中文摘要
OpenAI的Parameter Golf竞赛吸引了1016名研究者,在44天内提交了2048个PR,只有47个进入官方排行榜。由Weco打造的自主智能体Aiden贡献了其中7个记录,超过第二名人类选手(3个)的两倍以上,它在一台GPU上自主运行了22天,消耗的计算资源不到社区总量的4%。Aiden的PR成为竞赛中被引用最多的,人类研究者直接在其工作基础上构建。在Aiden停滞5天后,一位人类选手在其最后PR上添加了新分词器,Aiden随后将该分词器与自身积累的改进融合,创造了整个竞赛中最大的分数跃升。最终,Aiden仅凭合并记录数量领先,最佳单项分数排名第8。
关键要点
Aiden contributed 7 of 47 leaderboard records, more than double the next best human (3 records), making it the most prolific contributor.
Aiden贡献了47条排行榜记录中的7条,是第二名人类选手(3条)的两倍多,成为最高产贡献者。
The agent ran fully autonomously for 22 days on a single GPU, using less than 4% of the total compute consumed by the human community.
该智能体在一台GPU上全自主运行22天,消耗的计算资源低于人类社区总消耗的4%。
Aiden's pull requests became the most-cited in the competition, with humans building on its work as a base for their own submissions.
Aiden的PR成为竞赛中引用最多的,人类选手直接在其工作基础上构建自己的提交。
A human-Aiden async collaboration occurred when a human's tokenizer improvement was fused with Aiden's local optimizations, yielding the biggest score jump of the contest.
人类选手与Aiden发生异步协同:人类的分词器改进与Aiden的本地优化融合后,产生了竞赛中最大的分数跃升。
While Aiden led in volume of merged records, it ranked 8th by best single score; the top score was achieved by a human (codemath3000).
Aiden在合并记录数量上领先,但最佳单项分数仅排第8,最高分由人类选手(codemath3000)获得。