Yann LeCun bet a billion dollars that a machine can think without language, arguing that today's chatbots are a dead end and real intelligence requires world models that learn physics. The post raises two concerns: current AI tests rely on language, so world models may not be measured properly, and whether pure physical understanding without language can truly be called intelligence. The author suggests that neither pure chatbots nor pure world models are sufficient, and a combination of both might be necessary for true intelligence.
Anthropic released Claude Fable 5, their most capable publicly available model, and Claude Mythos 5 restricted to cyberdefense partners. Fable 5 demonstrated remarkable performance: it migrated a 50-million-line Ruby codebase in a day, beat Pokémon FireRed using raw screenshots, and scored highest on FrontierCode eval. Mythos 5 autonomously conducted genomics research across 138 species, outperforming a published Science paper with a 100x smaller model. The safety approach uses classifiers that silently fall back to Opus 4.8 on sensitive queries, with zero universal jailbreaks found in over 1,000 hours of testing. Pricing is $10/M input and $50/M output tokens, with free access through June 22 for limited plans.
OpenAI's Parameter Golf competition challenged 1,016 researchers to train small language models under a strict budget. Over 44 days and 2,048 pull requests, only 47 entries made the official leaderboard. The autonomous agent Aiden, built by Weco, submitted 7 of those 47 records—more than double the next-best human's 3—while running 22 days on a single GPU with under 4% of the community's compute. Its pull requests became the most-cited in the contest, with human researchers building directly on Aiden's work. After a 5-day plateau, a human contributed a novel tokenizer on top of Aiden's last PR, and Aiden fused that tokenizer with its local improvements to deliver the competition's largest single score jump. Aiden ranked 8th by best score, leading only in volume of merged records, not peak performance.
ArXiv, the preprint repository, is implementing a one-year ban for researchers who submit AI-generated low-quality papers, often referred to as 'AI slop'. This move aims to maintain the quality and integrity of scholarly submissions. The policy targets submissions that are automatically generated without meaningful scientific contribution. This decision reflects growing concerns about the misuse of AI in academic publishing.
The user built a semantic search engine specifically for arXiv papers. It provides AI-generated TL;DR summaries to quickly understand paper content. The tool also classifies claims made in papers and enables direct comparison between multiple papers. This aims to help researchers efficiently navigate and evaluate academic literature. The engine leverages AI to enhance the literature review process.
The post argues that despite AI-driven automation, the physical limits of mining increasingly degraded ore grades prevent true abundance. It claims resource inflation is rampant as consumption rises, while material science breakthroughs remain elusive. The author warns against 'gaslighting' that automation alone will solve resource constraints. No major breakthroughs have emerged despite massive investments. The tone is critical of optimistic narratives about AI solving material scarcity.