The paper proposes a statistical framework for using synthetic data in scientific research with provable validity guarantees. It introduces a new technical condition called task exchangeability, which requires that the current task is exchangeable with historical tasks for which real data exists. The authors develop inference methods that guarantee validity under task exchangeability and extend guarantees beyond it. The framework is demonstrated on public opinion surveys with LLM-generated silicon samples and on AI evaluation using autoraters. The work addresses fundamental concerns about bias, noise, and misspecification in synthetic data.
PapersSource: ARXIVImportance: 4/5
The paper studies token efficiency in time series (TS) language models from an asymmetric-token perspective, revealing that TS tokens contain highly redundant frequency patterns while only a small subset carries critical temporal evidence, and that prompt token influence attenuates with model depth. The authors propose an adaptive token budgeting framework that compresses TS tokens via frequency-domain structure and progressively reduces prompt tokens across layers. Evaluated on forecasting, classification, imputation, and anomaly detection, the method achieves up to 7.68× inference acceleration and performance gains in 78% of settings, demonstrating the effectiveness of asymmetric token compression for scalable TS foundation models.
PapersSource: ARXIVImportance: 4/5
The paper probes the causal influence of chain-of-thought reasoning steps in large reasoning models. It identifies a 'commitment boundary'—a sharp transition from transient guesses to a stable, high-confidence answer, often occurring in a single step well before the reasoning block ends. Steps after this boundary are epiphenomenal, leaving the final answer probability unchanged. Attention probes can linearly decode answer-formation stages from intermediate steps with high accuracy and generalize to unseen tasks. Exploiting this signal, early-exiting at the commitment boundary reduces CoT length by up to 55% on average with negligible performance loss.
The paper proposes Context-Driven Incremental Compression (C-DIC), which structures dialogue history as interleaved contextual threads and maintains revisable per-thread compression states in a compact dialogue memory. At each turn, a retrieve-revise-write-back loop shares information across turns and updates stale memories. It also adapts truncated backpropagation-through-time (TBPTT) to learn cross-turn dependencies without full-history backpropagation. Experiments on long-form dialogue benchmarks show C-DIC achieves stable inference latency and perplexity over hundreds of turns, outperforming existing context compression methods.
DIRECT is a routing framework that dynamically allocates test-time compute per prompt in embodied Vision-Language Model (VLM) planners by analyzing multimodal scene context. It examines three scaling axes—chain-of-thought depth, model size, and memory history—and reveals that naively scaling test-time compute yields uneven and often diminishing returns. Experiments on VLABench and RoboMME demonstrate that DIRECT significantly improves the success–cost Pareto frontier over fixed model selection. Validation on a physical Franka arm in a DROID setup shows that the router matches or exceeds a stronger model's success rate while cutting average latency by up to 65%. The results confirm that intelligent compute allocation enables frontier-level embodied planning at a fraction of the cost.
Researchers propose Doc-to-Atom (Doc2Atom), a parametric memory framework that compresses long documents into semantically typed knowledge atoms. Each atom is compiled into an independent micro-LoRA adapter and a provenance retrieval key. At inference, a lightweight query router assembles only relevant atoms into a query-specific adapter, which is injected into a frozen base model. The system is trained end-to-end via multi-objective distillation. Experiments on six QA benchmarks show Doc2Atom outperforms Doc-to-LoRA baselines while reducing the memory cost of document internalization.