Google DeepMind’s DiffusionGemma 26B A4B IT is an open-weights multimodal model that uses discrete diffusion to generate text from text, image, and video inputs. It has 25.2B total parameters and 3.8B active parameters (MoE), supports a 256K context window, and achieves over 1,100 tokens per second on NVIDIA H100 GPUs. NVIDIA has quantized the model to NVFP4 precision using its Model Optimizer, making it available on Hugging Face for commercial and non-commercial use. The model also features configurable thinking mode, native function calling, and multilingual support across 35+ languages.
AMD emphasized that its Unified Memory Architecture (UMA) will influence future chip roadmaps. The company specifically referenced the Ryzen AI MAX 400 series, which corresponds to the previously known Gorgon Halo systems, as a UMA-enabled product. The Reddit post links to a Wccftech article and earlier community discussions on UMA for local AI. No technical specifications or release dates were provided.
A user attempted to benchmark Google's new Eloquent local dictation app but found that it dropped about half of dictations, returning only a small fraction of spoken words. In 15 of 50 tests, the app provided a complete transcript with a word error rate of ~24%, comparable to Qwen3-ASR's ~21%. However, for the majority of attempts, the output was severely incomplete, with clips of 20+ words often yielding just 5-10 words. The user suspects the underlying chat-style AI model sometimes refuses to transcribe and instead responds with an apology, a behavior observed when running Gemma 3n directly on the same audio. The issue highlights a fundamental usability problem with the chat-based transcription approach.
Apple has released Core AI, a suite of local, private, and no-cost on-device AI models and tools, accompanied by a GitHub repository with benchmarks. Simultaneously, Microsoft introduced the Surface Laptop Ultra featuring local-first AI capabilities powered by Nvidia's RTX Spark chip. These announcements underscore a major industry shift toward on-device AI processing that reduces reliance on cloud services. A Reddit discussion highlights that the stock market has so far failed to price in the strategic implications of these simultaneous moves by two tech giants.
Apple revealed CoreAI at WWDC as a future replacement for CoreML, designed for optimized on-device inference on Apple Silicon devices including phones and tablets. The engine supports larger models than CoreML, with Apple demonstrating a 20-billion-parameter lazily loaded Mixture of Experts model deployable on device. Supported models are listed on GitHub, currently limited to mid-2025 releases, and require Python-based weight conversion similar to CoreML. CoreAI implies a major update to Apple Neural Engine operations, though no performance benchmarks have been released yet. It positions itself as an alternative to MLX, llama.cpp, and PyTorch for on-device deployment.
A new language model named Quasar-Preview from silx-ai has been published on Hugging Face. The release claims support for a context length of up to 5 million tokens. No further details about architecture, training, or evaluation are provided in the source. It is described as a preview version.