Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

11 items

MARKTECHPOSTJun 16, 2026Highlight

Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation

The Qwen team released Qwen-RobotSuite, a suite of three independent embodied AI foundation models for robotics. Qwen-RobotManip is a Vision-Language-Action model based on Qwen3.5-4B that aligns heterogeneous manipulation data into a unified 80-dimensional action vector, achieving 1st place on RoboChallenge Table30-v1 and strong cross-embodiment transfer. Qwen-RobotWorld is a language-conditioned video world model using a 60-layer dual-stream MMDiT and a frozen Qwen2.5-VL encoder, ranking 1st overall on EWMBench and DreamGen Bench. Qwen-RobotNav is a scalable navigation model built on Qwen3-VL with a parameterized observation interface, reaching 76.5% success rate on VLN-CE RxR and enabling agentic planning. RobotManip and RobotNav have public GitHub repositories; RobotWorld is presented as a research paper.

MARKTECHPOSTJun 14, 2026

A Coding Hands-On on FineWeb: Streaming, Filtering, Deduplication, Tokenization, and Large-Scale Web Corpus Analytics

A hands-on tutorial streams 3,000 documents from the FineWeb sample-10BT subset without downloading the full multi-terabyte corpus. It reproduces quality filters (Gopher, C4, custom), finding most already-passed due to pre-filtering. MinHash-based deduplication with 128 permutations and 0.7 threshold identifies few near-duplicate pairs, consistent with per-crawl deduplication. GPT-2 token counts are verified against the stored field, showing near-perfect match (mean absolute difference ~0). Analytics cover token distribution, language scores, characters per token, and top domains, providing practical insights for scaling corpus preprocessing pipelines.

MARKTECHPOSTJun 13, 2026

A Coding Implementation on Spatial Graph Neural Networks for Urban Function Inference Using city2graph, OSMnx, and PyTorch Geometric

This tutorial builds an end-to-end spatial graph learning pipeline using the city2graph library. It collects real POI and street network data from OpenStreetMap around Shibuya, Tokyo (with a synthetic clustered fallback to ensure reliability), engineers spatial features like local density and street distance, and constructs six proximity graph families (KNN, Delaunay, Gabriel, RNG, EMST, Waxman) to compare graph topologies. A two-layer GraphSAGE model is trained on a homogeneous KNN graph to predict urban function categories (food, retail, education, health) from spatial structure and node features, achieving test accuracy and macro-F1. The pipeline also demonstrates heterogeneous graph construction using bridge edges between node types and a heterogeneous GNN forward pass via PyTorch Geometric's to_hetero, along with PCA visualization of learned embeddings and a geographic prediction map.

MARKTECHPOSTJun 11, 2026Highlight

Perplexity Moves Deep Research Into Computer, Routing Research Subtasks Across 20+ Frontier Models For Reports, Decks, And Dashboards

Perplexity integrated its Deep Research mode into Computer, the company’s multi-model orchestration system. The upgraded feature automatically breaks complex questions into subtasks and routes them across more than 20 frontier models. It uses Search as Code to generate code that runs thousands of parallel retrieval steps, dramatically improving agentic browsing: the BrowseComp benchmark score rose from 40.7% to 83.8%, and Humanity’s Last Exam rose from 36.4% to 50.5%. The system reads user-uploaded files alongside live web sources, cites every claim inline, and delivers finished reports, slide decks, and interactive dashboards. Developers can access the same search stack via the pay-as-you-go Perplexity Agent API with a deep-research preset.

MARKTECHPOSTJun 10, 2026

Building a Code Dataset Pipeline from NVIDIA Nemotron-Pretraining-Code-v3 Metadata with Streaming, Pandas, and tiktoken

A practical tutorial demonstrates how to stream NVIDIA's Nemotron-Pretraining-Code-v3 metadata index without downloading the full multi-gigabyte dataset. It creates a shuffled 30,000-record sample, derives features like file extension and directory depth, and visualizes top languages, extensions, repositories, and directory nesting. The workflow reconstructs raw GitHub URLs from metadata fields (repo, commit_id, rel_path) and attempts to fetch actual source files, handling missing/deleted repos gracefully. A Python-file filter is applied, and token counts are estimated using tiktoken, while the full dataset's scale is noted at approximately 173 billion tokens across 146 million files. Processed outputs are saved as Parquet and JSON for reuse.

MARKTECHPOSTJun 9, 2026Highlight

A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search

A joint study by Harvard and Perplexity analyzed 10,000 matched session pairs from Perplexity Search and the AI agent Perplexity Computer over a 90-day window. Computer performed 26 minutes of autonomous work per session (median 9 minutes), a 48× increase over Search's 33 seconds (median 14 seconds). On matched tasks, Computer plus human reduced estimated time by 87% and cost by 94% versus Search plus human, with a meaningful dissatisfaction rate of 1.3% compared to 2.9% for Search. Computer queries also expanded task scope: cross-occupation share rose to 59% (vs 50%), higher-order Bloom's cognition was required in 76% of queries (vs 55%), and 23% of queries addressed task statements never submitted to Search.