Moonshot AI released Kimi K2.7-Code, an open-weight, coding-specialized agentic model under Modified MIT license. It is a Mixture-of-Experts architecture with 1T total parameters, 32B active per token, 384 experts with 8 selected, MLA attention, SwiGLU feed-forward, and a 400M-parameter MoonViT vision encoder. The model supports a 256K-token context window, ships with native INT4 quantization, and enforces mandatory thinking mode with fixed sampling parameters (temperature 1.0, top_p 0.95, n 1). In company-reported benchmarks, K2.7-Code achieves 62.0 on Kimi Code Bench v2 (+21.8% over K2.6), 81.1 on MCP Mark Verified (beating Claude Opus 4.8’s 76.4), and demonstrates approximately 30% lower reasoning-token usage than K2.6, reducing cost and latency in agentic workflows. The 595 GB model weights are available on Hugging Face and can be self-hosted via vLLM, SGLang, or KTransformers; API access uses the kimi-k2.7-code model name with OpenAI-compatible endpoints.
This tutorial builds an end-to-end spatial graph learning pipeline using the city2graph library. It collects real POI and street network data from OpenStreetMap around Shibuya, Tokyo (with a synthetic clustered fallback to ensure reliability), engineers spatial features like local density and street distance, and constructs six proximity graph families (KNN, Delaunay, Gabriel, RNG, EMST, Waxman) to compare graph topologies. A two-layer GraphSAGE model is trained on a homogeneous KNN graph to predict urban function categories (food, retail, education, health) from spatial structure and node features, achieving test accuracy and macro-F1. The pipeline also demonstrates heterogeneous graph construction using bridge edges between node types and a heterogeneous GNN forward pass via PyTorch Geometric's to_hetero, along with PCA visualization of learned embeddings and a geographic prediction map.
Zyphra has released Zamba2-VL, a family of open vision-language models in three sizes: 1.2B, 2.7B, and 7B parameters. Each model uses a hybrid Mamba2 state-space model combined with a small number of shared transformer blocks, replacing dense attention to achieve near-linear inference scaling. The models pair a Qwen2.5-VL vision encoder with this backbone, supporting single- and multi-image understanding and grounding. On 14 benchmarks, Zamba2-VL shows strong visual counting and document understanding (e.g., 90.9 DocVQA for the 2.7B model) but lags larger baselines on knowledge-heavy reasoning like MMMU and MathVista. Its main advantage is an order-of-magnitude lower time-to-first-token compared to comparable Transformer VLMs, particularly beneficial for long multimodal inputs and on-device deployment. Weights are released under Apache 2.0 license on HuggingFace with inference code available.
This tutorial walks through setting up NVIDIA cuTile Python in a Colab notebook, checking GPU/CUDA/driver compatibility, and implementing tiled kernels for vector addition, matrix addition, and matrix multiplication using direct load/store, gather/scatter, and matrix multiply accumulate. It provides wrapper functions that fall back to PyTorch when cuTile is unavailable, and validates outputs against PyTorch operations with correctness checks. The workflow includes benchmarking kernel performance against PyTorch equivalents and visualizing median runtimes, then suggests further experiments such as tile size tuning, precision comparison, and operation fusion. The notebook remains fully executable in Colab even without the required cuTile runtime.
Xiaomi's MiMo-V2.5-Pro-UltraSpeed achieves over 1000 tokens per second on a 1-trillion-parameter MoE model using commodity GPUs, a milestone at this scale. The speedup comes from three coordinated techniques: FP4 quantization applied only to MoE experts, DFlash speculative decoding that predicts entire token blocks in parallel, and the TileRT runtime optimized for microsecond-scale operations. Rejection sampling ensures lossless decoding while maintaining output quality. The system runs on a single 8-GPU node and is available through a limited API trial from June 9-23, 2026.
Microsoft AI has announced MAI-Transcribe-1.5, an updated automatic speech recognition model supporting 43 languages with a single system. It achieves a 2.4% word error rate on the Artificial Analysis leaderboard and claims best-in-class accuracy on the FLEURS benchmark. The model offers up to 5x faster transcription for long audio, transcribing an hour of audio in under 15 seconds. A new keyword biasing feature reduces errors on domain-specific terms by up to 30%. MAI-Transcribe-1.5 is integrated into Microsoft products like Copilot, Teams, and Dynamics 365 and is available through Azure AI Foundry.