Morph Reflexes: Multi-head classifiers for agent traces
English summary
Morph Reflexes is an API that provides fast, cheap semantic signals from agent traces using a small LLM with multi-head inference. The custom inference engine, forked from vLLM, reuses KV/cache across multiple classifier heads, achieving sub-30ms inference with less than 0.1% overhead per additional reflex. It enables tracking of behavioral issues like looping, reasoning leakage, and user frustration at scale, serving as a cost-effective replacement for LLM-as-judge. The system is API-first, allowing devs to define custom reflexes or use built-in ones, with a dashboard for training new classifiers. The architecture is inspired by older multi-task NLP techniques adapted to modern LLMs.
Chinese summary
Morph Reflexes 是一个 API,利用小型语言模型与多头推理,从智能体运行轨迹中快速、低成本地提取语义信号。其自定义推理引擎(派生自 vLLM)跨多个分类头复用 KV 缓存,实现低于 30 毫秒推理,每个新增反射的开销低于 0.1%。它可大规模追踪循环、推理泄露、用户沮丧等行为问题,替代昂贵的 LLM‑as‑Judge 方案。该系统 API 优先,允许开发者定义自定义反射或使用内置功能,并提供仪表板训练新分类器。架构借鉴了早期多任务 NLP 技术并适配现代 LLM。
Key points
Multi-head inference engine reuses KV cache across many classifier heads, enabling fast, concurrent classification.
多头推理引擎跨多个分类头复用 KV 缓存,实现快速并发的分类。
Sub-30ms inference latency and under 90ms end-to-end request, with <0.1% overhead per additional reflex.
推理延迟低于 30 毫秒,端到端请求低于 90 毫秒,每增加一个反射的开销低于 0.1%。
Tracks behavioral failures like looping, reasoning leakage, and user frustration in agent traces.
追踪智能体轨迹中的循环、推理泄露、用户沮丧等行为故障。
Designed as a scalable, cost-effective replacement for frontier LLM-as-judge in production agent monitoring.
设计为生产环境中替代昂贵的前沿 LLM‑as‑Judge 的可扩展、高性价比方案。
API-first with a dashboard to define and self-improve custom reflexes.
API 优先,提供仪表板用于定义和自我改进自定义反射。