This paper proposes Asymmetric Mutual Variational Learning (AMVL), a framework that addresses the train-inference mismatch in continuous latent reasoning for multimodal large language models. The mismatch arises because standard variational training forces the inference-time prior to mimic a posterior conditioned on ground-truth answers, causing answer leakage. AMVL uses a forward KL divergence to align the prior with the posterior and a novel reverse KL divergence to regularize the posterior, preventing collapse into inference-incompatible regions. The method is instantiated in a latent-integrated MLLM and evaluated on the BLINK benchmark, where it improves the average score by +10.83 and achieves gains of up to +32.00 on individual reasoning tasks, with analyses showing improved latent-space stability.
ELDR is a decode router for prefill-decode disaggregated serving of mixture-of-experts (MoE) models that addresses latency differences caused by the expert activation patterns per batch. It constructs an expert signature from a request's prefill activations to predict which experts will be used during generation, then uses offline balanced K-means to partition signature space across decode workers and a locality-band online policy that routes each request to the least-loaded worker among those best matching its signature. A signature cache co-indexed with the KV cache at KV-block granularity maintains exact signatures under prefix caching. Implemented in vLLM and tested with up to 40 GPUs across three MoE models and two workloads, ELDR reduced median time-per-output-token (TPOT) by 5.9–13.9% over the strongest of four load-balancing baselines while keeping model outputs unchanged.
Nvidia has published a quantized variant of the Mistral-Medium-3.5-128B large language model on Hugging Face. The model employs NVFP4, a 4-bit floating point precision format, to reduce memory footprint and potentially accelerate inference. It is labeled as conversational and text-generation compatible, using the safetensors format. The repository indicates the model is based on the original Mistral-Medium-3.5-128B from Mistral AI and is shared under a custom license.
The paper introduces SpheRoPE, a zero-shot, training-free, and optimization-free framework for generating 360° panoramic images and videos by directly injecting spherical priors into pre-trained diffusion transformers. Spherical RoPE replaces standard rotary position embeddings: low-frequency channels are re-parameterized as 3D Cartesian coordinates to natively encode the spherical manifold, while high-frequency channels are harmonically quantized to enforce exact periodicity. A complementary Semantic Distortion classifier-free guidance (CFG) steers geometry. The method is demonstrated on text-to-panorama using Flux.1, Flux.2, and LTX-Video backbones, achieving competitive results without any fine-tuning or inference-time optimization.
This paper proposes Multi-Block Diffusion Language Models (MBD-LMs), extending block diffusion LMs to decode multiple consecutive blocks in parallel for inter-block parallelism. To align training with multi-block inference, they introduce Multi-block Teacher Forcing (MultiTF), which trains on bounded noise-groups conditioned on clean prefixes with randomized noise-schedulers. A Block Buffer decoding algorithm preserves KV-cache reuse and static input shapes, translating parallelism into wall-clock speedup. On MBD-LLaDA2-Mini, average tokens per forward pass increase from 3.47 to 6.19 while accuracy rises from 79.95% to 81.03%. Combined with DMax, the model reaches 9.34 TPF with only a 1.02% accuracy drop on math and code benchmarks.
Jackrong has uploaded a GGUF quantized model file for Qwopus3.6-35B-A3B-Coder on Hugging Face. The base model is a multimodal mixture-of-experts model based on Qwen3.6, designed for coding, tool-use, and function calling, supporting an image-text-to-text pipeline. This GGUF version enables efficient local inference with llama.cpp. The repository is released under the Apache 2.0 license. At the time of posting, the file had 62 likes and 0 downloads, with no performance benchmarks provided.