Thinkgap feed

AI signal, minus the noise.

6 items9 sourcesUpdated daily

XJul 31, 2026

protip: if you can distil models, you can also distil agent harnesses

Swyx shared a protip on X, suggesting that model distillation techniques could be extended to distill agent harnesses. The post implies that agent frameworks might be compressed in a similar way to large language models. No further details or concrete examples were provided.

HACKERNEWSJul 31, 2026Highlight

CTGT Finds Distilling DeepSeek into GPT-OSS Does Not Transfer Censorship, Releases LineageEval Framework

CTGT used DeepSeek V4 Flash as a teacher to distill GPT-OSS-120B for finance tasks, achieving 83.61% on FinanceReasoning at an 8k token budget, outperforming Kimi K3 and Inkling. They measured censorship transfer with 152 matched political prompt pairs scored by four LLM judges; the teacher showed a +45.45 point gap (7 SD from chance) in avoiding China‑sensitive topics, but all distilled students stayed within 1 point of their American base model. The censorship from the Chinese teacher did not transfer. CTGT released the open‑evaluation framework LineageEval, open‑weight 20B finance model, and a playground for side‑by‑side testing. They plan to extend the study using Chinese‑lineage base models like Qwen.

XJul 31, 2026

CoRT Counterfactual Replay for Token-Level Rubric-Guided Policy Optimization

A research paper introduces CoRT, a method that employs counterfactual replay to enable token-level, rubric-guided policy optimization. The approach aims to align language model outputs with granular, token-wise scoring rubrics. No implementation details or benchmark results were provided in the shared content.

XJul 30, 2026

CoRT: token-level credit assignment for rubric-guided RL

CoRT introduces a token-level credit assignment method for rubric-guided reinforcement learning. It addresses the limitation in GRPO where rubric feedback is collapsed into a single scalar reward, instead providing fine-grained credit to individual tokens. This allows models to learn from detailed rubric evaluations on a per-token basis.

XJul 30, 2026

Andrew Ho Leaves OpenAI to Launch Startup Producing High-Quality RL Datasets for Scientific Reasoning

Andrew Ho announced his last day at OpenAI after eight months, revealing he is starting a new company focused on producing high-quality reinforcement learning datasets. He argues that LLMs have poor generalization and that economically productive capabilities are underrepresented in existing data, predicting that frontier labs will need to spend over $100B on targeted data acquisition. The company's first products will target biology and statistical reasoning: long-horizon scientific reasoning datasets based on GeneBench-Pro, where GPT-5.6 Sol's pass rate is barely above 30%, aiming to push reliability to over 90%; and multimodal datasets covering day-to-day scientific tasks such as analyzing cell culture plates or Western blots. Beyond biology, expansion is planned into chemistry, materials science, healthcare, and white-collar office work.

XJul 16, 2026Highlight

Thinking Machines Releases New Open-Weight Multimodal MoE Model with 975B Parameters

Thinking Machines has released a new multimodal model with fully open weights, offering native reasoning across text, image, and audio. The model uses a mixture-of-experts architecture with 975 billion total parameters and 41 billion active parameters, supporting up to 1 million tokens of context. It features controllable reasoning effort that achieves lower token usage at similar performance. Fine-tuning is supported from day one on Tinker, and the model demonstrates strong agentic coding and tool use capabilities with support across major inference platforms. A smaller Inkling-Small model is planned for future release.