渐强攻击通过多轮对话轨迹劫持自主AI智能体;Bendex Arc检测工具已发布
英文摘要
The Crescendo attack is a multi-turn prompt injection technique that primes autonomous AI agents using a series of seemingly benign messages across a conversation, evading defenses that inspect only single messages. It compromises agents with real tool access (email, browsing, external data) without triggering alerts. Bendex Arc is an open-source tool designed to catch such attacks by monitoring the full behavioral trajectory of a session and detecting adversarial drift before the malicious payload lands. The tool is available on GitHub with a free tier, and it specifically addresses trajectory-based manipulation that current per-message defenses miss.
中文摘要
渐强攻击(Crescendo attack)是一种多轮对话注入技术,通过一连串表面无害的消息逐步毒化自主AI智能体的上下文,从而绕过所有仅检查单条消息的防御措施。该攻击对具备真实工具访问权限(如读取邮件、浏览网页)的智能体有效,且不会触发任何警报。Bendex Arc 是一个开源工具,通过追踪整个会话的行为轨迹,在恶意载荷落地前检测到对抗性漂移。该工具已在 GitHub 发布并提供免费层,专门解决当前逐条防御无法发现的轨迹级操纵问题。
关键要点
Crescendo attack uses a multi-turn conversation to gradually poison an AI agent’s context, causing it to later execute a malicious payload without any single message tripping defenses.
渐强攻击利用多轮对话逐步毒化AI智能体的上下文,使其在后续执行恶意载荷时,没有单条消息会触发防御。
Most current prompt injection defenses evaluate each message in isolation and lack memory of the conversation trajectory, making them blind to this attack pattern.
大多数当前的提示注入防御仅孤立评估每条消息,缺少对对话轨迹的记忆,因此对此类攻击模式视而不见。
Bendex Arc tracks the full behavioral trajectory across a session and identifies adversarial drift early, stopping the attack before the final payload is executed.
Bendex Arc 追踪整个会话的行为轨迹,提前发现对抗性漂移,在最终载荷执行前阻断攻击。
The tool is open-source (available on GitHub) and targets agents that interact with external tools, read emails, browse websites, or process untrusted data without human review.
该工具为开源(可在GitHub获取),专门针对那些无需人工审核即可访问外部工具、读取邮件、浏览网页或处理不可信数据的人工智能智能体。