The GitHub release tag for PyTorch Inductor CI flow (ciflow/inductor/184166) contains only the status '[ghstack-poisoned]', indicating that the CI workflow failed due to a poisoned ghstack state. No code changes, new features, or performance results are reported.
ReposSource: GITHUBImportance: 3/5
Release b9637 of llama.cpp introduces a dedicated chat parser for the Cohere2MoE model architecture, referred to as North Code. The parser is implemented via PR #24615 to ensure correct conversation formatting for Cohere's mixture-of-experts variant. The release ships pre-built binaries for macOS, Linux, Windows, and Android across CPU, CUDA, Vulkan, ROCm, SYCL, and other backends. No other functional changes are noted in the release notes beyond this parser addition and some internal renames.
ReposSource: GITHUBImportance: 2/5
llama.cpp b9631 addresses a command-line interface bug where preserved tokens were not correctly copied, as tracked in issue #24258. The release includes pre-compiled binaries for macOS (Apple Silicon and Intel), Linux (x64, arm64, s390x, Vulkan, ROCm, OpenVINO, SYCL), Android (arm64), Windows (CPU, CUDA, Vulkan, SYCL, HIP), and openEuler platforms. This is a routine patch release primarily focused on a single CLI fix.
ReposSource: GITHUBImportance: 2/5
This release of llama.cpp adds the cohere2moe tokenizer to llama-vocab, enabling inference with the TINY_AYA model. The change was contributed via pull request #24601. Build artifacts are provided for macOS, Linux, Windows, and Android across various backends.
The b9628 release of llama.cpp integrates SYCL backend validation into the continuous integration and release testing pipeline. The new check-release workflow now covers SYCL FP32 and FP16 builds on Ubuntu x64 and SYCL on Windows x64, ensuring Intel GPU acceleration is regularly tested. The release also maintains existing test matrices for macOS, Linux (CPU, Vulkan, ROCm, OpenVINO), Android, and Windows (CUDA, Vulkan, HIP).
Release b9626 of llama.cpp introduces support for the Cohere2 Mixture of Experts (MoE) architecture under the new arch name "cohere2moe". It fixes sliding window attention pattern handling, resolves MTP failures by switching to iSWA, and adjusts shared expert combination to (routed+shared)*0.5. Redundant gating function checks, lmhead tensor checks, and tokenizer type definitions were removed; the tokenizer is kept as tiny_aya. Platform builds are provided for macOS (Apple Silicon/Intel), Linux (x64/arm64 with Vulkan, ROCm, OpenVINO, SYCL), Android, and Windows (CPU/CUDA/Vulkan/SYCL/HIP), along with UI support.