Moonshot AI released Kimi K2.7-Code, an open-weight, coding-specialized agentic model under Modified MIT license. It is a Mixture-of-Experts architecture with 1T total parameters, 32B active per token, 384 experts with 8 selected, MLA attention, SwiGLU feed-forward, and a 400M-parameter MoonViT vision encoder. The model supports a 256K-token context window, ships with native INT4 quantization, and enforces mandatory thinking mode with fixed sampling parameters (temperature 1.0, top_p 0.95, n 1). In company-reported benchmarks, K2.7-Code achieves 62.0 on Kimi Code Bench v2 (+21.8% over K2.6), 81.1 on MCP Mark Verified (beating Claude Opus 4.8’s 76.4), and demonstrates approximately 30% lower reasoning-token usage than K2.6, reducing cost and latency in agentic workflows. The 595 GB model weights are available on Hugging Face and can be self-hosted via vLLM, SGLang, or KTransformers; API access uses the kimi-k2.7-code model name with OpenAI-compatible endpoints.
This tutorial provides a complete coding implementation of Microsoft SkillOpt’s instrumented prompt optimization pipeline. It sets up the environment with OpenAI-compatible model access, using GPT-4o as the optimizer and GPT-4o-mini as the target model. A baseline evaluation is performed on the SearchQA validation set before running the optimization loop, which includes rollout, reflection, aggregation, selection, slow update, and meta-skill mechanisms. The training process is visualized with accuracy curves, edit-budget scheduling, and cumulative token usage. Finally, the evolved best skill is evaluated against the unseen split, demonstrating a measurable hard-match accuracy lift over the seed baseline.
A joint study by Harvard and Perplexity analyzed 10,000 matched session pairs from Perplexity Search and the AI agent Perplexity Computer over a 90-day window. Computer performed 26 minutes of autonomous work per session (median 9 minutes), a 48× increase over Search's 33 seconds (median 14 seconds). On matched tasks, Computer plus human reduced estimated time by 87% and cost by 94% versus Search plus human, with a meaningful dissatisfaction rate of 1.3% compared to 2.9% for Search. Computer queries also expanded task scope: cross-occupation share rose to 59% (vs 50%), higher-order Bloom's cognition was required in 76% of queries (vs 55%), and 23% of queries addressed task statements never submitted to Search.
Google Research has introduced a new Agentic RAG framework integrated into the Gemini Enterprise Agent Platform. The framework features a Sufficient Context Agent that iteratively searches until it gathers complete context before generating a response. This multi-agent architecture breaks down complex queries into subtasks, improving accuracy by up to 34% on factuality datasets compared to standard RAG. Tested on the FramesQA benchmark, the system achieved 90.1% accuracy in cross-corpus retrieval while maintaining low latency. The feature, called Cross-Corpus Retrieval, is now in public preview.
This tutorial demonstrates how to use the GEPA framework for reflective prompt optimization on arithmetic word problems. It covers setting up a deterministic benchmark, defining a structured evaluator with scoring and feedback, and evolving multi-component prompts (instructions and format rules) using a reflection model. The process begins with a weak seed prompt and iteratively improves it based on actionable feedback. The optimized prompt is compared on a held-out validation set to assess generalization. The tutorial provides a complete workflow with code, highlighting the shift from manual trial and error to automated prompt evolution.