Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

5 items

REDDIT LOCALLLAMAJun 11, 2026Highlight

NVIDIA Releases NVFP4-Quantized DiffusionGemma 26B A4B IT Model on Hugging Face

Google DeepMind’s DiffusionGemma 26B A4B IT is an open-weights multimodal model that uses discrete diffusion to generate text from text, image, and video inputs. It has 25.2B total parameters and 3.8B active parameters (MoE), supports a 256K context window, and achieves over 1,100 tokens per second on NVIDIA H100 GPUs. NVIDIA has quantized the model to NVFP4 precision using its Model Optimizer, making it available on Hugging Face for commercial and non-commercial use. The model also features configurable thinking mode, native function calling, and multilingual support across 35+ languages.

REDDIT LOCALLLAMAJun 10, 2026

Reddit User Seeks Local LLM Recommendations for Handwriting OCR

A Reddit user reports using Qwen3-VL 8B via Ollama for OCR of handwritten letters, achieving decent results. They ask the LocalLLaMA community for other local models that might perform better for handwriting OCR.

REDDIT LOCALLLAMAJun 10, 2026

Lemonade v10.7 release and project organization update

Lemonade v10.7 introduces local omni-modal chat supporting image generation and editing by combining multiple backends and models; its LMX-Omni virtual models are now compatible with Open WebUI and other OpenAI clients. The release adds a lemonade bench CLI tool to collect standardized LLM performance data across llama.cpp, FastFlowLM, and vLLM. Cross-vendor support expands with CUDA backends for llama.cpp and stable-diffusion.cpp and a Vulkan backend for sd-cpp, enabling GPU acceleration on AMD, Apple Silicon, Nvidia, and Intel systems. The project is now organized into six working groups, four led by non-AMD contributors, and this release involved 19 contributors.

REDDIT LOCALLLAMAJun 9, 2026

SCAIL-2: An Open-Source End-to-End Character Animation Model with Cross-Identity Replacement and Animal Driving

SCAIL-2 is an open-source model for end-to-end controlled character animation that removes dependence on intermediate pose representations. It was trained on 60K synthetic motion pairs using several teacher models (SCAIL-Preview, Wan-Animate, MoCha) and a Unified Motion Transfer Interface. The model enables animating a reference character from a driving video, supports cross-identity character replacement and multi-character scenarios, and extends to animal-driving. Additionally, it offers zero-shot support for advanced control intermediates like SAM3D-Body mesh rendering.

REDDIT LOCALLLAMAJun 9, 2026Highlight

Omi Med STT v1: Fine-Tuned Parakeet 0.6B for Medical ASR Released with Open Weights and Local Runtime

Omi Health founder released Omi Med STT v1, an open-weight (CC-BY-4.0) fine-tune of NVIDIA Parakeet TDT 0.6B v2 specialized for medical speech, with a local runtime that auto-selects backends (MLX on Apple Silicon, NeMo on CUDA, GGUF on CPU). On a held-out benchmark of 1,513 medical clips (7.18 hours), it achieves a medical word error rate (M-WER) of 2.37% and overall WER 8.30% while running at 145× realtime on an A10, significantly outperforming the base model and most open local ASR options. The model trails only VibeVoice-ASR 9B on M-WER but beats it on WER and speed, and rivals cloud-based medical transcription services such as ElevenLabs Scribe v2 (M-WER 1.39%) and AssemblyAI (1.81%) with the structural latency advantage of on-device processing. Training used 127 hours of audio (71% real, 29% synthetic), and the benchmark confirmed zero overlap with training data; key weaknesses are drug name accuracy (4.75% drug WER) targeted for improvement in v2.