FastContext is a system that decouples repository exploration from code solving in LLM coding agents to reduce token waste from irrelevant snippets. It deploys specialized exploration models as a dedicated subagent, issuing parallel tool calls and delivering focused context via concise file paths and line ranges. The approach cuts token consumption by up to 60% and improves resolution rates by up to 5.5% relative to baseline agents.
APPO is a new agentic reinforcement learning method that improves multi-turn tool-use in large language model agents. It refines branching and credit assignment by focusing on fine-grained token-level decision points rather than coarse heuristic interaction units. The method selects branching locations using token uncertainty and policy-induced likelihood gains, leading to more precise exploration and better credit distribution across branched rollouts. Experiments across 13 benchmarks show APPO consistently boosts performance over existing agentic RL methods by approximately 4 points. The approach also ensures efficient tool-calls and maintains behavioral interpretability.
Researchers propose MRAgent, a framework that improves long-horizon memory reasoning for LLM agents. It uses a Cue-Tag-Content graph representation and an active reconstruction mechanism to dynamically retrieve and prune memory paths during inference. This moves beyond the static retrieve-then-reason paradigm, adapting memory access to intermediate reasoning evidence. Experiments show up to 23% performance improvement over baselines and reduced token and runtime costs. The work demonstrates efficient memory reconstruction for complex agent tasks.
Researchers propose FORT, a framework for synthesizing training data for deep search agents that resists shortcut learning. It identifies and mitigates four types of shortcut risks: evidence co-coverage, single-clue selectivity, exposed constants, and prior-knowledge binding. The framework uses trajectory signatures to measure and control shortcut risks during data generation. Experiments show that FORT-generated data leads to improved search agent performance on deep search benchmarks. The accompanying tool, FORT-Searcher, outperforms comparable agents on challenging tasks. Code is available on GitHub.
MiniMax Sparse Attention (MSA) is a new method for efficient processing of ultra-long contexts (hundreds of thousands to millions of tokens) in large language models. It uses blockwise sparsity and an optimized GPU execution path to achieve significant speedups in both training and inference while maintaining performance. The method is built on Grouped Query Attention (GQA), introducing a lightweight Index Branch for group-specific sparse token retrieval and a Main Branch for exact block-sparse attention. MSA is co-designed with GPU kernels for cross-GPU scalability and has been deployed in a production-grade multimodal model, reducing per-token attention compute. Its inference kernel and model are openly available online.
The paper introduces EvoArena, a benchmark designed to simulate real-world dynamic changes for LLM agents, and EvoMem, a memory paradigm that models progressive updates and structured memory evolution. Current LLM agents show significant difficulty on EvoArena's evolving tasks. EvoMem consistently improves agent performance on EvoArena and also increases accuracy on existing benchmarks like GAIA and LoCoMo. By recording memory evolution and update histories, EvoMem enables better reasoning about environmental shifts. The work demonstrates the importance of incorporating evolution modeling into both evaluation and memory for effective agent deployment.