A Reddit user reports that after extensive testing on three low-end laptops (Intel i3, 8GB RAM, integrated GPU), Qwen3-VL-2B in Q4_K_M GGUF quantization reliably extracts data from images to JSON, outperforming Qwen3-VL-4B and Qwen3.5 2B. The user notes this model is absent from major benchmarks like Artificial Analysis and the Open LLM Leaderboard, which list the 4B version instead. The post questions why it is ignored and asks if any other model can handle the task on similarly constrained devices like phones or Raspberry Pis. No quantitative benchmarks or replication details are provided.
Clark Labs has compressed the Sana 1.6B text-to-image transformer to ternary quantization (~1.85 bits per weight), achieving an 8.6× size reduction from 3.21 GB (FP16) to just 374 MB while retaining near-FP16 image quality. The model uses group-wise scales and maintains a small high-precision tail (~5% of parameters for conditioning and projection layers) to preserve important details. The packed ternary weights are provided alongside an unpacked bf16 version that is a drop-in replacement for diffusers. Released under the Apache-2.0 license, this compressed model enables efficient local deployment of Sana 1.6B on resource-constrained hardware.
A user proposes an experimental paradigm to test whether a large language model can extract a reusable 'procedural scaffold' from its superior performance on a Three.js task and transfer it to a small model, making its outputs deeper without fine-tuning. The paradigm uses a cross-domain setup: the large model improves a complex scene (domain 1) to generate a scaffold, which is then applied to the small model for a completely different Three.js task (domain 2, a low-poly turret). A blind third large model judges rendered outputs from the small model with and without the scaffold, comparing visual quality and structural coherence. The experiment has not been run yet; the core claim is that if the scaffolded small model outperforms the baseline on an unseen domain, it demonstrates genuine transferable procedural knowledge.
A LocalLLaMA community member completed a multi-GPU build using an existing RTX 5090 and a newly acquired RTX PRO 5000, achieving 80GB of total VRAM. The 9950X3D system also includes 192GB RAM and 17TB storage, powered by a 1300W PSU. The user originally planned to buy an RTX PRO 6000 for $8.5K with a hoped-for NVIDIA Inception discount, but after a 3-month wait the application was rejected and the product price surged to $13.5K. They instead purchased the last available RTX PRO 5000 in their country with the saved funds. The rig is now used for large quantized LLMs (Q8) and multi-GPU ComfyUI workflows.
An open evaluation pitted 55 LLMs from 11 developer families against 198 hand-written prompts; models then blind-graded each other across 22,254 judgments, excluding self-ratings. All eight families with sufficient data showed statistically significant same-family rating bias: Qwen judges favored other Qwen models by +0.91 points, while Mistral judges penalized other Mistral models by −1.02 points—the largest absolute bias. Other families ranged from xAI (+0.75) to Meta (−0.68). Aggregate leaderboards obscured category-level variation, with six different models topping nine categories, and code tasks provoked the highest judge disagreement. The full dataset, code, and prompts are MIT-licensed, and the author outlines next steps including anchoring to ground truth and isolating judge bias from response quality.