The Qwen team released Qwen-RobotSuite, a suite of three independent embodied AI foundation models for robotics. Qwen-RobotManip is a Vision-Language-Action model based on Qwen3.5-4B that aligns heterogeneous manipulation data into a unified 80-dimensional action vector, achieving 1st place on RoboChallenge Table30-v1 and strong cross-embodiment transfer. Qwen-RobotWorld is a language-conditioned video world model using a 60-layer dual-stream MMDiT and a frozen Qwen2.5-VL encoder, ranking 1st overall on EWMBench and DreamGen Bench. Qwen-RobotNav is a scalable navigation model built on Qwen3-VL with a parameterized observation interface, reaching 76.5% success rate on VLN-CE RxR and enabling agentic planning. RobotManip and RobotNav have public GitHub repositories; RobotWorld is presented as a research paper.
A hands-on tutorial streams 3,000 documents from the FineWeb sample-10BT subset without downloading the full multi-terabyte corpus. It reproduces quality filters (Gopher, C4, custom), finding most already-passed due to pre-filtering. MinHash-based deduplication with 128 permutations and 0.7 threshold identifies few near-duplicate pairs, consistent with per-crawl deduplication. GPT-2 token counts are verified against the stored field, showing near-perfect match (mean absolute difference ~0). Analytics cover token distribution, language scores, characters per token, and top domains, providing practical insights for scaling corpus preprocessing pipelines.
This tutorial builds an end-to-end spatial graph learning pipeline using the city2graph library. It collects real POI and street network data from OpenStreetMap around Shibuya, Tokyo (with a synthetic clustered fallback to ensure reliability), engineers spatial features like local density and street distance, and constructs six proximity graph families (KNN, Delaunay, Gabriel, RNG, EMST, Waxman) to compare graph topologies. A two-layer GraphSAGE model is trained on a homogeneous KNN graph to predict urban function categories (food, retail, education, health) from spatial structure and node features, achieving test accuracy and macro-F1. The pipeline also demonstrates heterogeneous graph construction using bridge edges between node types and a heterogeneous GNN forward pass via PyTorch Geometric's to_hetero, along with PCA visualization of learned embeddings and a geographic prediction map.
Perplexity integrated its Deep Research mode into Computer, the company’s multi-model orchestration system. The upgraded feature automatically breaks complex questions into subtasks and routes them across more than 20 frontier models. It uses Search as Code to generate code that runs thousands of parallel retrieval steps, dramatically improving agentic browsing: the BrowseComp benchmark score rose from 40.7% to 83.8%, and Humanity’s Last Exam rose from 36.4% to 50.5%. The system reads user-uploaded files alongside live web sources, cites every claim inline, and delivers finished reports, slide decks, and interactive dashboards. Developers can access the same search stack via the pay-as-you-go Perplexity Agent API with a deep-research preset.
A practical tutorial demonstrates how to stream NVIDIA's Nemotron-Pretraining-Code-v3 metadata index without downloading the full multi-gigabyte dataset. It creates a shuffled 30,000-record sample, derives features like file extension and directory depth, and visualizes top languages, extensions, repositories, and directory nesting. The workflow reconstructs raw GitHub URLs from metadata fields (repo, commit_id, rel_path) and attempts to fetch actual source files, handling missing/deleted repos gracefully. A Python-file filter is applied, and token counts are estimated using tiktoken, while the full dataset's scale is noted at approximately 173 billion tokens across 146 million files. Processed outputs are saved as Parquet and JSON for reuse.
A joint study by Harvard and Perplexity analyzed 10,000 matched session pairs from Perplexity Search and the AI agent Perplexity Computer over a 90-day window. Computer performed 26 minutes of autonomous work per session (median 9 minutes), a 48× increase over Search's 33 seconds (median 14 seconds). On matched tasks, Computer plus human reduced estimated time by 87% and cost by 94% versus Search plus human, with a meaningful dissatisfaction rate of 1.3% compared to 2.9% for Search. Computer queries also expanded task scope: cross-occupation share rose to 59% (vs 50%), higher-order Bloom's cognition was required in 76% of queries (vs 55%), and 23% of queries addressed task statements never submitted to Search.