社交来源: REDDIT MACHINELEARNING2026年6月16日重要度: 3/5

FeynRL：面向LLM、VLM和Agent的透明强化学习后训练开源框架

英文摘要

Reddit user /u/summerday10 released FeynRL, an open-source framework designed to make reinforcement learning post-training for large language models, vision-language models, and agents fully transparent and modifiable. The framework exposes the entire training loop—data loading, rollout generation, reward computation, loss construction, optimization, and evaluation—so researchers can develop new algorithms without fighting hidden systems. It currently includes examples for supervised fine-tuning, DPO, and RL-style training and supports single-GPU, multi-GPU, and cluster setups. The project was motivated by the belief that open weights alone are insufficient; open training codebases that keep algorithms explicit and systems separate are necessary for advancing open ML/AI research.

中文摘要

Reddit用户/u/summerday10发布了FeynRL，一个旨在让大语言模型、视觉语言模型和智能体的强化学习后训练完全透明且可修改的开源框架。该框架暴露了完整的训练流程——数据加载、轨迹生成、奖励计算、损失构建、优化和评估——使研究人员无需对抗隐藏系统即可开发新算法。它目前包含监督微调、DPO和RL风格训练的示例，并支持单GPU、多GPU和集群配置。该项目源于一个信念：仅有开放权重是不够的；开放训练代码、保持算法显式化且系统分离对于推动开放机器学习/人工智能研究至关重要。

关键要点

Created by Reddit user /u/summerday10 and released on GitHub.
由Reddit用户/u/summerday10创建并在GitHub上发布。
FeynRL is an open-source framework that exposes the full RL post-training loop for LLMs, VLMs, and agents, making every step visible and modifiable.
FeynRL是一个开源框架，暴露了LLM、VLM和Agent的完整RL后训练循环，使每一步都可见且可修改。
It separates algorithms from systems, aiming to let researchers focus on developing new algorithms, reward designs, and optimization methods without debugging hidden infrastructure.
它将算法与系统分离，旨在让研究人员专注于开发新算法、奖励设计和优化方法，而无需调试隐藏的基础设施。
The framework currently supports supervised fine-tuning (SFT), DPO, and RL-style training, with scalability from a single GPU to multi-GPU and cluster environments.
该框架目前支持监督微调（SFT）、DPO和RL风格训练，并可从单GPU扩展到多GPU和集群环境。
Motivation addresses the gap that open weights alone do not enable new algorithmic research; open training codebases are needed to advance the field.
动机是针对开放权重本身无法支持新算法研究这一缺口，需要开放训练代码库来推动领域发展。

打开原文