Morph Reflexes: Multi-head classifiers for agent traces

Loading / 加载中

English summary

Morph Reflexes is an API that provides fast, cheap semantic signals from agent traces using a small LLM with multi-head inference. The custom inference engine, forked from vLLM, reuses KV/cache across multiple classifier heads, achieving sub-30ms inference with less than 0.1% overhead per additional reflex. It enables tracking of behavioral issues like looping, reasoning leakage, and user frustration at scale, serving as a cost-effective replacement for LLM-as-judge. The system is API-first, allowing devs to define custom reflexes or use built-in ones, with a dashboard for training new classifiers. The architecture is inspired by older multi-task NLP techniques adapted to modern LLMs.

Chinese summary

Morph Reflexes 是一个 API，利用小型语言模型与多头推理，从智能体运行轨迹中快速、低成本地提取语义信号。其自定义推理引擎（派生自 vLLM）跨多个分类头复用 KV 缓存，实现低于 30 毫秒推理，每个新增反射的开销低于 0.1%。它可大规模追踪循环、推理泄露、用户沮丧等行为问题，替代昂贵的 LLM‑as‑Judge 方案。该系统 API 优先，允许开发者定义自定义反射或使用内置功能，并提供仪表板训练新分类器。架构借鉴了早期多任务 NLP 技术并适配现代 LLM。

Key points

Multi-head inference engine reuses KV cache across many classifier heads, enabling fast, concurrent classification.

多头推理引擎跨多个分类头复用 KV 缓存，实现快速并发的分类。

Sub-30ms inference latency and under 90ms end-to-end request, with <0.1% overhead per additional reflex.

推理延迟低于 30 毫秒，端到端请求低于 90 毫秒，每增加一个反射的开销低于 0.1%。

Tracks behavioral failures like looping, reasoning leakage, and user frustration in agent traces.

追踪智能体轨迹中的循环、推理泄露、用户沮丧等行为故障。

Designed as a scalable, cost-effective replacement for frontier LLM-as-judge in production agent monitoring.

设计为生产环境中替代昂贵的前沿 LLM‑as‑Judge 的可扩展、高性价比方案。

API-first with a dashboard to define and self-improve custom reflexes.

API 优先，提供仪表板用于定义和自我改进自定义反射。