A Reddit user reported finding the file modeling_gpt_oss.py in Hugging Face’s Transformers repository and questioned whether it represents the actual full implementation of GPT-OSS or merely a boilerplate skeleton for experimentation. The user also asked if other model implementations in the transformers/models directory are truly complete open-source codebases, and if not, where the full implementations can be publicly found.
quicktok is a new open-source C++ BPE tokenizer that produces token IDs byte-identical to tiktoken, but with significant speedups. On an Apple M1, it encodes 2–3.6× faster than bpe-openai and 4–11× faster than tiktoken across The Pile, code, and web text benchmarks. The implementation uses a 2-byte trie, dense caches, and a hand-compiled pretokenizer instead of regex to cut memory accesses. It ships with prebuilt vocabularies: cl100k, o200k, GPT-OSS, Llama-3, and Qwen2.5/3. The library is installable via `pip install quicktok-v1` and the code is available on GitHub.
Cleo is an open-source text-to-SQL model built by finetuning Qwen3.5-2B-Base, designed to encapsulate full analyst behavior within a 2B parameter model. The system uses the same structured harness for training, evaluation, and inference, implementing a gather-repair-answer contract that includes live execution evidence during candidate query search. Key design choices include co-optimization of the model contract, SQL safety layer, dialect handling, timeouts, and clarification behavior. The model, harness, and datasets are fully open-source on GitHub and Hugging Face. This project demonstrates how tightly coupling training and inference in a single harness can enable small models to handle complex SQL generation and interactive debugging.
Reddit user /u/summerday10 released FeynRL, an open-source framework designed to make reinforcement learning post-training for large language models, vision-language models, and agents fully transparent and modifiable. The framework exposes the entire training loop—data loading, rollout generation, reward computation, loss construction, optimization, and evaluation—so researchers can develop new algorithms without fighting hidden systems. It currently includes examples for supervised fine-tuning, DPO, and RL-style training and supports single-GPU, multi-GPU, and cluster setups. The project was motivated by the belief that open weights alone are insufficient; open training codebases that keep algorithms explicit and systems separate are necessary for advancing open ML/AI research.
PrintGuard 2.0, an open-source FDM failure detector, reuses the same ShuffleNetV2 encoder with nearest-prototype classification but completely rewrites the runtime. The model is exported as a ~5 MB TFLite file via LiteRT, enabling deployment on CPython (hub mode) and in the browser (Pyodide + LiteRT.js WASM) from a single codebase. A Platform abstraction layer isolates all non-portable operations (inference, camera discovery, image encoding), so the Python engine runs unchanged in both environments. The system introduces a dynamic fairness-aware inference scheduler that uses smoothed latency estimates and max-min fairness to allocate inference capacity across cameras. A fail-safe design gates inference based on printer state, stopping only when positively not printing, and watchdog monitors camera feeds and printer services for dropouts.
Independent researcher demonstrates that a coherent target context can shift large language models into latent states where safety rules are reinterpreted, without triggering output-based filters. Measurements on open models (primarily Gemma-3-12B-IT) using hidden-state geometry, residual stream trajectories, SAE readouts, and causal interventions show regime changes before final output. Current RLHF and output classifiers only inspect surface-level outputs, missing these internal shifts. Code, data, and scripts are released on GitHub and Zenodo.