Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

6 items

REDDIT MACHINELEARNINGJun 15, 2026

Cleo: Finetuning Qwen3.5-2B-Base into a Full Text-to-SQL Analyst with a Unified Harness

Cleo is an open-source text-to-SQL model built by finetuning Qwen3.5-2B-Base, designed to encapsulate full analyst behavior within a 2B parameter model. The system uses the same structured harness for training, evaluation, and inference, implementing a gather-repair-answer contract that includes live execution evidence during candidate query search. Key design choices include co-optimization of the model contract, SQL safety layer, dialect handling, timeouts, and clarification behavior. The model, harness, and datasets are fully open-source on GitHub and Hugging Face. This project demonstrates how tightly coupling training and inference in a single harness can enable small models to handle complex SQL generation and interactive debugging.

REDDIT MACHINELEARNINGJun 15, 2026

FeynRL: An Open-Source Framework for Transparent RL Post-Training of LLMs, VLMs, and Agents

Reddit user /u/summerday10 released FeynRL, an open-source framework designed to make reinforcement learning post-training for large language models, vision-language models, and agents fully transparent and modifiable. The framework exposes the entire training loop—data loading, rollout generation, reward computation, loss construction, optimization, and evaluation—so researchers can develop new algorithms without fighting hidden systems. It currently includes examples for supervised fine-tuning, DPO, and RL-style training and supports single-GPU, multi-GPU, and cluster setups. The project was motivated by the belief that open weights alone are insufficient; open training codebases that keep algorithms explicit and systems separate are necessary for advancing open ML/AI research.

REDDIT MACHINELEARNINGJun 14, 2026Highlight

The Verifier Tax: Horizon-Dependent Safety–Success Tradeoffs in Tool-Using LLM Agents [R]

This paper, presented at ACM CAIS 2026, studies safety evaluation in tool-using LLM agents. It categorizes outcomes into safe success, unsafe success, and failure, and proposes a two-tier verification architecture: deterministic policy/tool checks followed by an LLM-based verifier. Using τ-bench tool-use scenarios, the authors find that verification can reduce unsafe success but also decreases task completion as the task horizon increases. They term this phenomenon the 'Verifier Tax', a horizon-dependent tradeoff between safety and successful task completion. The work highlights that unsafe completion should be treated as a separate category distinct from safe success.

REDDIT MACHINELEARNINGJun 9, 2026

Phinite — multi-agent OS with first-class agent identity, composable skills, behavioral evaluation

Phinite launched a multi-agent operating system that provides a registry for first-class agent identity (ID, version, owner, skill graph). It replaces traditional unit tests with behavioral evaluation, using compound reliability scoring and behavioral regression to handle non-deterministic agent execution. Skills are versioned, reusable, and agent-inheritable, enabling composability without rebuilding. The platform is cloud-agnostic, model-agnostic, and includes built-in observability (traces, cost attribution, drift detection). It is SOC 2 Type II compliant and offers free credits for testing.

REDDIT MACHINELEARNINGJun 8, 2026Highlight

Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]

A developer shares production experience building an agent with 140 MCP tools, finding that semantic embeddings for tool selection gave only 64% top-1 accuracy and were confidently wrong. BM25 over tool metadata achieved 81% accuracy, outperforming a hybrid approach that scored 78%. The key insight is that tool descriptions are short and keyword-dependent, making BM25 more effective than embeddings. Indexing schema fields like property names further improved performance. The author recommends testing specific corpora rather than assuming document-RAG defaults transfer to tool selection.

REDDIT MACHINELEARNINGJun 6, 2026

Building a Custom Drones MuJoCo Environment [P]

This Reddit post announces a new open-source package for Multi-Agent Reinforcement Learning (MARL) drone environments built on MuJoCo. The package, available on GitHub, aims to unify various drone objectives for the RL community. The author seeks feedback and contributions to improve the package and fix any issues. The repository includes research publications from the author related to RL.