The city of Rio de Janeiro has post-trained and released a massive language model named Rio 3.5 Open, with 397 billion parameters. It is built upon a Qwen base model—referred to as Qwen 7/2—and integrates SwiGLU activation and Rotary positional embeddings. The model is openly accessible, marking a rare public-sector contribution of a large-scale open LLM.
SocialSource: XImportance: 3/5
The viral study tested medical AI products UpToDate and OpenEvidence—not underlying models—on benchmarks like MedQA and HealthBench, finding them worse than frontier general-purpose models. The author argues this does not prove domain-specific models are inherently inferior; their own comprehensive benchmark shows fine-tuning a frontier model for medicine yields a noticeable boost. Current domain-specific models often lag because they are built on older or weaker open-source base models, not because specialization fails. For example, Baichuan-M4 is cited as a medical-specific model that claims to outperform frontier models. The main takeaway is that adapting strong frontier models into medical tools quickly would produce superior domain-specific systems, but open-source base model progress and adaptation speed remain challenges.
SocialSource: XImportance: 3/5
Trajectory Labs announced they have achieved frontier model performance using an open model that was post-trained in under 24 hours. The training infrastructure was powered by Together Compute and NVIDIA. No specific model name, benchmark metrics, or dataset details were provided in the social media post. The announcement highlights the potential of combining open models with efficient training infrastructure.
Pyrecall is a new open-source tool built to address the lack of practical tooling for continual learning research. It snapshots skill scores before and after fine-tuning, flags performance regressions, and supports rolling back LoRA adapters by name. The tool runs fully locally, is released under the MIT license at v0.1.0, and can be installed via pip. The developer is seeking community feedback on the benchmark design.
Deploying an initial AI model is rarely the hard part; real users introduce internal terminology, incomplete queries, and messy documents that benchmarks never capture. Most production systems do not connect inference logs, dataset curation, fine‑tuning, and evaluation within a single loop, turning every model improvement into a separate one-off project. The core bottleneck is model iteration—the ability to convert production traffic into failure patterns, create or curate datasets, re‑train or fine‑tune, and redeploy consistently. The post describes an insurance chatbot use case where a continuous feedback loop from production logs to post‑training and redeployment improved the model, and notes that platforms like Data Lab treat logs, datasets, post‑training, and deployment as parts of the same iteration cycle.
On the NVIDIA AI Podcast, Mistral AI CTO and co-founder Timothée Lacroix discussed the company's open-model philosophy, its Forge customization framework, and the collaboration with NVIDIA through the Nemotron Coalition. The conversation addresses bringing open models to enterprise environments. Lacroix elaborated on Mistral's approach to openness and model adaptation. The Nemotron Coalition is a partnership aimed at advancing AI capabilities.