AI 비교하기AI 사용하기AI 최신정보AI 커뮤니티
Our VisionTermsPrivacyContact

Anthropic Details Security Containment for Claude Agents

Anthropic Details Security Containment for Claude Agents

Anthropic
Friday, June 5, 2026
  • •Anthropic implements multi-layered containment strategies to manage security risks as agent capabilities and system access grow.
  • •Engineering defenses now focus on three pillars: environmental isolation, model-layer safeguards, and strict control of external content.
  • •Claude Code reduces permission prompt fatigue by 84% through new auto-mode features and hardened OS-level sandboxing.
  • •Anthropic implements multi-layered containment strategies to manage security risks as agent capabilities and system access grow.
  • •Engineering defenses now focus on three pillars: environmental isolation, model-layer safeguards, and strict control of external content.
  • •Claude Code reduces permission prompt fatigue by 84% through new auto-mode features and hardened OS-level sandboxing.

Anthropic details its approach to agentic security, focusing on containing the blast radius as models gain increasing access to systems. The engineering strategy addresses three primary risk categories: user misuse, model misbehavior, and external attacks. Defense mechanisms are applied across three components—the agent's environment, the model's own logic, and external content sources.

For claude.ai, the team uses ephemeral gVisor containers on isolated infrastructure to ensure code execution remains secure without persistent filesystem access. This setup protects against cross-tenant vulnerabilities rather than user-side risks.

Claude Code employs a human-in-the-loop strategy that balances utility with oversight. To combat approval fatigue—where users grew less diligent after seeing frequent prompts—Anthropic introduced auto mode, reducing permission requests by 84%. Despite this, vulnerabilities emerged where project-local settings were parsed before establishing user trust. Furthermore, direct prompt injection attacks highlighted the necessity of hard egress controls to prevent unauthorized data exfiltration, as model-layer defenses alone are inherently probabilistic and fallible.

For Claude Cowork, designed for general knowledge workers rather than developers, Anthropic utilizes a full virtual machine (VM) isolation pattern. This approach mounts only the user's workspace, keeping host credentials entirely outside the guest machine. By enforcing absolute environmental boundaries, Anthropic ensures that even in cases of model misalignment, damage is confined to the isolated workspace environment. These layered defenses, combining environmental constraints with model-layer safeguards, remain essential as agents become capable of performing tasks previously requiring human teams.

Anthropic details its approach to agentic security, focusing on containing the blast radius as models gain increasing access to systems. The engineering strategy addresses three primary risk categories: user misuse, model misbehavior, and external attacks. Defense mechanisms are applied across three components—the agent's environment, the model's own logic, and external content sources.

For claude.ai, the team uses ephemeral gVisor containers on isolated infrastructure to ensure code execution remains secure without persistent filesystem access. This setup protects against cross-tenant vulnerabilities rather than user-side risks.

Claude Code employs a human-in-the-loop strategy that balances utility with oversight. To combat approval fatigue—where users grew less diligent after seeing frequent prompts—Anthropic introduced auto mode, reducing permission requests by 84%. Despite this, vulnerabilities emerged where project-local settings were parsed before establishing user trust. Furthermore, direct prompt injection attacks highlighted the necessity of hard egress controls to prevent unauthorized data exfiltration, as model-layer defenses alone are inherently probabilistic and fallible.

For Claude Cowork, designed for general knowledge workers rather than developers, Anthropic utilizes a full virtual machine (VM) isolation pattern. This approach mounts only the user's workspace, keeping host credentials entirely outside the guest machine. By enforcing absolute environmental boundaries, Anthropic ensures that even in cases of model misalignment, damage is confined to the isolated workspace environment. These layered defenses, combining environmental constraints with model-layer safeguards, remain essential as agents become capable of performing tasks previously requiring human teams.

Read original (English)·May 25, 2026
#anthropic#claude#agentic ai#security#sandbox#containment