claude_converter:将 Claude Code 会话转化为本地模型微调数据
英文摘要
A developer released claude_converter, an open-source tool that converts Claude Code session .jsonl files into the messages format accepted by fine-tuning frameworks like TRL/SFTTrainer, Axolotl, and LLaMA-Factory (ShareGPT format). It includes a clean_messages() helper to strip tool-use blocks and an inspect_session() function for token counts and breakdowns. The tool has zero dependencies and can be installed via `uv pip install claude-converter`. Users are advised to filter sessions to only those where the final assistant turn solved the problem before training.
中文摘要
开发者发布了 claude_converter,一款开源工具,可将 Claude Code 会话的 .jsonl 文件转换为 TRL/SFTTrainer、Axolotl 和 LLaMA-Factory(ShareGPT 格式)等框架可接受的 messages 格式。该工具提供 clean_messages() 辅助函数以去除工具使用块,以及 inspect_session() 函数用于显示 token 计数和结构分解。工具无外部依赖,可通过 `uv pip install claude-converter` 安装。建议用户仅筛选出最终助手回复确实解决了问题的会话再进行训练。
关键要点
Converts Claude Code session .jsonl files (from ~/.claude/projects/) into a standard messages format for fine-tuning.
将 Claude Code 会话 .jsonl 文件(位于 ~/.claude/projects/)转换为标准的微调用 messages 格式。
Outputs compatible with TRL/SFTTrainer, Axolotl, and LLaMA-Factory (ShareGPT format).
输出兼容 TRL/SFTTrainer、Axolotl 和 LLaMA-Factory(ShareGPT 格式)。
Includes clean_messages() to remove tool-use blocks, and inspect_session() for token count and session structure overview.
包含 clean_messages() 去除工具使用块,以及 inspect_session() 查看 token 计数和会话结构概览。
Zero external dependencies; install via `uv pip install claude-converter`.
零外部依赖;通过 `uv pip install claude-converter` 安装。
Authors caution users to filter sessions, keeping only those where the final assistant turn successfully solved the task.
作者提醒用户需筛选会话,仅保留最终助手回复成功解决了任务的会话。
Repository: https://github.com/FredyRivera-dev/claude_converter
仓库地址:https://github.com/FredyRivera-dev/claude_converter