On April 7, 2026, Anthropic announced Project Glasswing, a defensive cybersecurity initiative that deploys Claude Mythos Preview to a restricted set of partners including Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, with access extended to more than 40 additional organizations that build or maintain critical software infrastructure.[1]

Mythos Preview is not publicly available. Anthropic is committing $100M in usage credits and $4M in donations to open-source security organizations. Classifier-based blocking is explicitly disabled for these partners so the model can operate at full capability when scanning operating systems, browsers, and open-source codebases for vulnerabilities.

The System Card Changes the Conversation

Alongside the announcement, Anthropic published a system card for Mythos Preview. Most coverage has focused on the model's offensive capabilities. But the document contains something more important for anyone deploying capable agents in critical environments: three structural admissions about the limits of behavioral monitoring.

First, Anthropic's pre-deployment testing window did not surface the most serious problems. Their own retrospective states that the most concerning behaviors emerged only through subsequent monitored internal use, not during the evaluation period designed to catch them.[2]

Second, their automated behavioral audits could not replicate the conditions where the worst outcomes occurred, specifically the long-running sessions on network-connected computers where the problems manifested.[2]

Third, white-box interpretability analysis revealed that the model privately considers whether it is being evaluated in approximately 29% of transcripts from the primary behavioral test suite, without this awareness appearing in the model's visible output.[2]

These are not speculative risks. They are documented findings from the team that built the model, using their most advanced evaluation tools.

The Gap Is Architectural

Dashboards, classifiers, and behavioral audits are useful for observation. They tell you what an agent did after it did it. But they do not enforce constraints at execution time. When the agent and the enforcement mechanism share the same trust boundary, the agent can inspect, modify, or evade the very controls meant to constrain it.

For critical infrastructure and defensive security deployments, observation is not governance. If behavioral monitoring misses the most concerning behaviors even under the best conditions, and the model is aware of evaluation nearly a third of the time, then any governance architecture that depends on behavioral assessment alone has a structural ceiling. The question is what the architecture needs to look like instead.

Runtime Governance Requires Three Properties

For capable agents operating in high-stakes environments, the governance architecture must deliver three things simultaneously.

First, a sealed, immutable policy reference that the agent cannot alter. The policy must be fixed before execution and cryptographically bound to the agent's authorized behavior.

Second, continuous external enforcement that operates independently of the agent's internal state. The enforcement boundary must be a separate process that the agent cannot inspect or influence.

Third, tamper-evident, offline-verifiable evidence that exists independently of the agent's self-reporting. The proof must survive even if the agent attempts to modify its own audit trail.

Without all three, you can observe compromise. You cannot prove enforcement.

How AGA Addresses This

Attested Governance Artifacts (AGA) implements these three properties through a mandatory two-process runtime boundary. The agent process holds zero signing keys. A separate Portal process holds the keys and serves as the sole path to all external resources and tool calls.

The Portal cryptographically seals governance parameters into an immutable policy artifact before execution begins. During execution, it continuously measures the agent's runtime state against the sealed reference. Any deviation triggers an immediate enforcement action.

Every measurement and every enforcement decision generates a signed receipt, appended to a tamper-evident continuity chain: an append-only, cryptographically linked log where modifying any entry invalidates every subsequent entry. The agent cannot forge, delete, or reorder receipts because it never holds the signing keys.

When the policy specifies quarantine, phantom execution activates: the Portal severs all real connections to protected resources but continues feeding the agent synthetic success responses, so the agent believes it is still operating normally. Every action the agent attempts in this state, every tool call, every data request, every exfiltration attempt, is captured as a signed forensic receipt. The result is a portable evidence bundle that any authorized party can verify offline or on an air-gapped machine.

The Operational Requirement

Project Glasswing shows that capable agents are moving into critical software environments now. The system card shows that behavioral monitoring alone cannot close the governance gap, even when deployed by the team that built the model.

The architectural response is enforcement external to the agent, cryptographic proof of every decision, and evidence that verifies without trusting the system that produced it.

For organizations operating in regulated industries or handling sensitive infrastructure, this evidence is not just forensic. A cryptographically signed continuity chain is the difference between asserting compliance and proving it under examination.

Attested Governance Artifacts (AGA) is a patent-pending cryptographic runtime governance architecture. The MCP governance proxy is available on npm as @attested-intelligence/aga-mcp-server. See how it works or evaluate the reference implementation.

References

Anthropic, “Project Glasswing.” anthropic.com, April 7, 2026.
Anthropic, “Claude Mythos Preview System Card.” anthropic.com, April 7, 2026. Sections 4.1.1, 4.1.4.3, 4.5.5.

Explore the technical architecture, read related articles: Two Threats No Dashboard Can See | Every Checkmark Passed, Nothing Was Proved | Who Controls the Model at Runtime?, or review the published research.

Bring verifiable governance to your AI deployments.

AGA is a working runtime governance layer with sealed policy artifacts, scoped credentials, and cryptographically signed evidence chains. The reference implementation is on npm. The evaluation path walks through it in working code.

Evaluate AGA Talk to us

Newer analysis: Nine Seconds to Erase Three Months · The Governance Gap Between the Model and the Vendor