This week Bloomberg reported that a small group of unauthorized users gained access to Claude Mythos Preview, Anthropic's restricted cybersecurity model, the same day it was announced under Project Glasswing.[1] They have been using it regularly since then. The evidence they provided to Bloomberg? Screenshots and a live demonstration from a member of a private Discord channel. Anthropic confirmed the access occurred “through one of our third-party vendor environments.” None of Anthropic's own systems were affected.
The most revealing part of this story is not the breach. It is what serves as evidence of the breach.
A Discord screenshot is not forensic evidence. Neither is an anonymous source's live demonstration of unauthorized access to a restricted model. Yet these are now the artifacts through which the public is learning what actually happened inside one of the most tightly controlled AI deployments of 2026. In a properly instrumented system, an incident like this would generate its own forensic record: cryptographically signed evidence of every credentialed access, a tamper-evident chain of who used what and when, and an offline-verifiable audit trail that an investigator could trust without trusting the vendor whose breach they were investigating. Instead, the story is reaching the world through reporters and screenshots.
This is not an Anthropic problem. It is an industry problem, and it is about to become a regulatory one.
The Investment Was in the Wrong Layer
Two years of AI safety work has focused on model behavior. Alignment research, red-teaming, jailbreak resistance, capability evaluation, refusal training. All of it necessary, all of it addressing whether a model will do the right thing when it is used. None of it addresses what happened this week. The Mythos model behaved exactly as designed. The failure was at a layer the model itself cannot see: credentialed access flowing through a vendor ecosystem, without cryptographic accountability for who actually used it.
The missing capability is easy to state. The industry has no standard way to prove identity, authority, and sequence of access across organizational boundaries. Every symptom reported this week follows from its absence.
Frontier models are increasingly deployed through multi-tenant vendor environments, with access gated by credentials that can be leaked, shared, transferred, or misused. Glasswing granted Mythos access to roughly forty organizations, only a dozen of which Anthropic has publicly named.[5] Most of those organizations employ thousands of people with varying credential hygiene and contractor oversight. Many maintain their own third-party vendor relationships. The trust boundary cascades through at least four organizational layers (provider, customer, contractor, and end user) before anyone actually touches the model.
Consider how this failure mode plays out operationally. A contractor at an authorized Glasswing recipient organization has credentials for a vendor-hosted environment that brokers access to the model. The contractor shares or loses those credentials: a phishing attack, a poorly secured laptop, a reseller arrangement the original organization never fully audited. Unauthorized use begins. Each party in the chain captures only fragments in its own mutable log files, and none can independently verify the others. When the incident surfaces weeks later through an outside channel, investigators face a reconstruction problem with evidence scattered across systems controlled by the same parties whose security failed in the first place. This is not theoretical. It matches the multi-party vendor access Anthropic has publicly confirmed in this week's incident.
Adjacent problems have been partially solved. Cloud providers have experimented with workload identity through SPIFFE. Supply-chain attestation exists through Sigstore and SLSA. Signed audit logs, WORM storage, and TPM-backed attestations are deployed in specific regulated industries. What is missing is a widely deployed, cross-organization, offline-verifiable evidence layer purpose-built for AI access governance: the specific application of these primitives to the problem of who actually used which model and when.
At each trust boundary, the industry's default practice remains logging, and the logs live on infrastructure controlled by the party whose behavior is being logged. Mutable logs alone are not enough under hostile scrutiny. That moment arrives when a regulator, an investigator, or a litigant needs to know what actually happened and has reason to distrust every party in the chain.
When Regulators and Intelligence Collide
The Mythos incident is not an isolated case of governance strain. The same week Bloomberg's report ran, Axios reported that the NSA is actively using Mythos,[2] despite the Department of Defense having labeled Anthropic a “supply chain risk” in February, a designation that prompted the company to sue the Pentagon.[3] The UK's AI Security Institute has published its own technical evaluation of the model.[4] The federal government is at once declaring Anthropic a supply-chain risk, deploying the company's models through its intelligence agencies, and now investigating unauthorized access to those same models.
The structural problem this exposes is not about any one organization. It is that the United States government currently has no mechanism to answer its own question about what is actually happening inside the AI deployments it both uses and regulates. DoD cannot verify whether NSA's Mythos usage conforms to any particular standard, because no such verification mechanism exists. Anthropic cannot demonstrate to the Pentagon, on technical grounds, that its deployments are auditable in a way that addresses supply-chain concerns. The evidence layer does not yet exist. The litigation and the covert use and the unauthorized access are expressions of the same underlying absence.
The regulatory direction is clear. The EU AI Act requires demonstrable audit trails for high-risk systems. NIST's AI Risk Management Framework extends that requirement to provenance documentation across the AI lifecycle, and CISA's Secure by Design principles apply it to runtime integrity. Each framework assumes that organizations deploying AI can produce tamper-evident evidence of how those systems were used. As enforcement mechanisms mature and insurance markets price AI risk, that assumption will harden into requirement. Enterprises in defense, healthcare, finance, critical infrastructure, and intelligence are going to face pressure for exactly the kind of evidence this week's incident demonstrates they cannot currently produce.
What the Evidence Layer Actually Requires
The architectural requirements already appear in the cryptographic literature, in adjacent deployed systems, and in standards documents including NIST AI RMF and the draft requirements from CAISI. Two of them are load-bearing.
Active enforcement at the access boundary, not passive logging. A passive logger records what happened after the fact, and can be tampered with before anyone reads it. An active enforcer prevents the access from completing until the record is durably committed. If the record cannot be written, the access does not happen. This eliminates the most common forensic failure mode, in which the logs have been modified, truncated, or lost by the time anyone thinks to look at them. The cryptographic overhead for this enforcement is measured in single-digit milliseconds, a rounding error compared to the multi-second latency of frontier model inference.
Offline verifiability. The evidence must be verifiable without access to the infrastructure that produced it. An auditor investigating a breach cannot be required to trust the vendor whose breach they are investigating. Verification must work on an air-gapped laptop, with the evidence bundle alone, using public keys pinned in advance. Logs are trusted because the infrastructure producing them is trusted. Evidence is trusted because it can be verified without trusting anyone.
Three supporting requirements complete the architecture. Cryptographic signatures on every individual access event, rather than hashed logs or signed batches. Hash-linked chains in which modification of any single event invalidates all subsequent events. And structural metadata separation that allows third-party verification without requiring payload disclosure, a property that matters for healthcare, finance, and any deployment involving sensitive customer data. Canonical JSON serialization (RFC 8785) makes these signatures deterministic across implementations, allowing an evidence bundle produced by one system to be verified by entirely different software months later.
None of these primitives are exotic. Ed25519 for signatures, SHA-256 for hashing, Merkle trees for inclusion proofs. The post-quantum migration path is known: hybrid composite signatures combining classical and lattice-based schemes. What the industry has not done is assemble these components into a runtime governance layer purpose-built for AI access.
The obvious objection is operational complexity. This architecture adds key management burden and storage costs that linear logging does not. Offline verification is not strictly necessary for every deployment; a customer-facing chatbot with no regulatory exposure may never need it. The argument is not that every AI deployment requires cryptographic governance. It is that an increasing share of enterprise deployments will eventually face this question under subpoena, regulatory audit, insurance claim, or incident response. Those deployments need to start building the evidence layer now, because retroactive reconstruction is not available to them.
Disclosure
I have been working on this problem for eighteen months. Attested Governance Artifacts (AGA) is a cryptographic runtime governance system that implements the architecture described above: active enforcement at the access boundary, signed receipts for every governed event, hash-linked continuity chains, offline-verifiable evidence bundles, and structural metadata separation. The patent application was filed with the USPTO in December 2025. A working Go implementation is complete, with post-quantum hybrid signatures deployed ahead of NIST's migration timeline.
AGA does not address model-behavior risks. Those are the correct focus of current alignment research. AGA addresses the adjacent problem: when a model does something consequential, can anyone prove what happened and under whose authority? If your organization is deploying AI into environments where that question will eventually be asked under subpoena, regulatory audit, insurance claim, or incident response, this is the conversation to have now, not after the fact.
The Pattern Ahead
The Mythos incident will not be the last one, because the conditions that produced it are structural, not incidental. Forty organizations now hold access to a model Anthropic describes as capable of enabling dangerous cyberattacks. Each sits atop its own cascade of vendor relationships, contractors, and credentials that none of the parties above them can fully audit. The industry has no mechanism to prove, after the fact, which credentials were actually used and for what purpose.
The group that accessed Mythos this week said they were interested in “playing around with new models, not wreaking havoc with them.” That sounds almost reassuring. The framing won't last. Every future incident of this kind will involve either a more consequential actor or a more consequential use, and the question asked afterward will be the same one that cannot currently be answered: who actually did this, through which credentials, and how do we know?
Organizations that build the evidence layer in advance will answer that question with a signed evidence bundle verifiable on an air-gapped laptop. Organizations relying on vendor-side logging will answer it with a press statement.
The model-safety investment of the last two years was necessary. It was never going to be sufficient on its own. The governance question the industry now has to answer is not whether models behave correctly, but whether anyone can prove, to a standard that survives hostile scrutiny, what those models were actually used for, and by whom.
References
- “Anthropic's Mythos AI Model Is Being Accessed by Unauthorized Users.” Bloomberg, April 21, 2026. Accessible mirror: TechCrunch.
- “Scoop: NSA using Anthropic's Mythos despite Defense Department blacklist.” Axios, April 19, 2026.
- “Anthropic sues the Trump administration after it was designated a supply chain risk.” CNN, March 9, 2026. Formal DoD designation February 27, 2026; lawsuit filed March 9, 2026.
- “Our evaluation of Claude Mythos Preview's cyber capabilities.” AI Security Institute (UK).
- Anthropic, “Project Glasswing: Securing critical software for the AI era.” anthropic.com, April 7, 2026.
Attested Intelligence builds cryptographic runtime governance for AI systems. The architecture described here is implemented in @attested-intelligence/aga-mcp-server, available on npm. Evaluation path, CLI, and verifier at attestedintelligence.com/evaluate. Prior analysis at attestedintelligence.com/blog.
USPTO App. No. 19/433,835 · Patent Pending · Attested Intelligence Holdings LLC
