A Hands-On Tutorial for Building a Local Coding Agent Stack with Qwen3.6 and Open-Weight Models
Sebastian Raschka provides a detailed guide on setting up a fully local coding agent environment. The tutorial uses Ollama to serve open-weight LLMs such as Qwen3.6 35B-A3B and Cohere North Mini Code, connecting them to agent harnesses like Qwen-Code, Codex, and Claude Code. Performance testing shows both Qwen3.6 and North Mini Code generate ~30–40 tokens per second on a Mac Mini or DGX Spark and solve 4–5 out of 5 tasks on a custom agent problem pack. The article also includes an audit checklist for agent codebases, noting that Claude Code consumes substantially more input tokens than Codex for comparable task outcomes. Setup instructions cover modeling serving, harness configuration, and an SSH tunnel for offloading model execution to a dedicated machine.