This week Bloomberg reported that a small group of unauthorized users gained access to a frontier AI lab's restricted cybersecurity model, the same day it was announced under the lab's critical-software safety program.[1]They have been using it regularly since then. The evidence they provided to Bloomberg? Screenshots and a live demonstration from a member of a private Discord channel. The lab confirmed the access occurred “through one of our third-party vendor environments.” None of the lab's own systems were affected.

The most revealing part of this story is what serves as evidence of the incident, not the incident itself.

A Discord screenshot is not forensic evidence. Neither is an anonymous source's live demonstration of unauthorized access to a restricted model. Yet these are now the artifacts through which the public is learning what actually happened inside one of the most tightly controlled AI deployments of 2026. In a properly instrumented system, an incident like this would generate its own forensic record: cryptographically signed evidence of every credentialed access, a tamper-evident chain of who used what and when, and an offline-verifiable audit trail that an investigator could trust without trusting the vendor whose incident they were investigating. Instead, the story is reaching the world through reporters and screenshots.

This is an industry problem, not an Anthropic one, and it is about to become a regulatory one.

The Investment Was in the Wrong Layer

Two years of AI safety work has focused on model behavior. Alignment research, red-teaming, jailbreak resistance, capability evaluation, refusal training. All of it necessary, all of it addressing whether a model will do the right thing when it is used. None of it addresses what happened this week. The model behaved exactly as designed. The failure was at a layer the model itself cannot see: credentialed access flowing through a vendor ecosystem, without cryptographic accountability for who actually used it.

The missing capability is easy to state. The industry has no standard way to prove identity, authority, and sequence of access across organizational boundaries. Every symptom reported this week follows from its absence.

Frontier models are increasingly deployed through multi-tenant vendor environments, with access gated by credentials that can be leaked, shared, transferred, or misused. The lab's safety program granted access to a named set of about a dozen partners, with more than 40 additional organizations extended access on top of that.[5] Most of those organizations employ thousands of people with varying credential hygiene and contractor oversight. Many maintain their own third-party vendor relationships. The trust boundary cascades through at least four organizational layers (provider, customer, contractor, and end user) before anyone actually touches the model.

Consider how this failure mode plays out operationally. A contractor at an authorized recipient organization has credentials for a vendor-hosted environment that brokers access to the model. The contractor shares or loses those credentials: a phishing attack, a poorly secured laptop, a reseller arrangement the original organization never fully audited. Unauthorized use begins. Each party in the chain captures only fragments in its own mutable log files, and none can independently verify the others. When the incident surfaces weeks later through an outside channel, investigators face a reconstruction problem with evidence scattered across systems controlled by the same parties whose security failed in the first place. This is not theoretical. It matches the multi-party vendor access the lab has publicly confirmed in this week's incident.

Adjacent problems have been partially solved. Cloud providers have experimented with workload identity through SPIFFE. Supply-chain attestation exists through Sigstore and SLSA. Signed audit logs, WORM storage, and TPM-backed attestations are deployed in specific regulated industries. What is missing is a widely deployed, cross-organization, offline-verifiable evidence purpose-built for AI access governance: the specific application of these primitives to the problem of who actually used which model and when.

At each trust boundary, the industry's default practice remains logging, and the logs live on infrastructure controlled by the party whose behavior is being logged. Mutable logs alone are not enough under hostile scrutiny. That moment arrives when a regulator, an investigator, or a litigant needs to know what actually happened and has reason to distrust every party in the chain.

When Regulators and Intelligence Collide

This incident is not an isolated case of governance strain. The same week Bloomberg's report ran, Axios reported that the NSA is actively using the model,[2] despite the Department of Defense having labeled the lab a “supply chain risk” in February, a designation that prompted the company to sue the Pentagon.[3] The UK's AI Security Institute has published its own technical evaluation of the model.[4]The federal government is at once declaring the lab a supply-chain risk, deploying the company's models through its intelligence agencies, and now investigating unauthorized access to those same models.

The structural problem this exposes is bigger than any one organization: the United States government currently has no mechanism to answer its own question about what is actually happening inside the AI deployments it both uses and regulates. DoD cannot verify whether NSA's usage of the model conforms to any particular standard, because no such verification mechanism exists. The lab cannot demonstrate to the Pentagon, on technical grounds, that its deployments are auditable in a way that addresses supply-chain concerns. The evidence does not yet exist. The litigation and the covert use and the unauthorized access are expressions of the same underlying absence.

The regulatory direction is clear. The EU AI Act requires demonstrable audit trails for high-risk systems. NIST's AI Risk Management Framework extends that requirement to provenance documentation across the AI lifecycle, and CISA's Secure by Design principles apply it to runtime integrity. Each framework assumes that organizations deploying AI can produce tamper-evident evidence of how those systems were used. As governance mechanisms mature and insurance markets price AI risk, that assumption will harden into requirement. Enterprises in defense, healthcare, finance, critical infrastructure, and intelligence are going to face pressure for exactly the kind of evidence this week's incident demonstrates they cannot currently produce.

What Verifiable Evidence Actually Requires

The architectural requirements already appear in the cryptographic literature, in adjacent deployed systems, and in standards documents including NIST AI RMF and the draft requirements from CAISI. Two of them are load-bearing.

Active governance at the access boundary, instead of passive logging. A passive logger records what happened only once it is over, and can be tampered with before anyone reads it. An active enforcer prevents the access from completing until the record is durably committed. If the record cannot be written, the access does not happen. This eliminates the most common forensic failure mode, in which the logs have been modified, truncated, or lost by the time anyone thinks to look at them. The cryptographic overhead for this governance is measured in single-digit milliseconds, a rounding error compared to the multi-second latency of frontier model inference.

Offline verifiability. The evidence must be verifiable without access to the infrastructure that produced it. An auditor investigating an incident cannot be required to trust the vendor whose incident they are investigating. Verification must work on an air-gapped laptop, with the evidence bundle alone, using public keys pinned in advance. Logs are trusted because the infrastructure producing them is trusted. Evidence is trusted because it can be verified without trusting the producer, on an open-source verifier you can read first.

Three supporting requirements complete the architecture. Cryptographic signatures on every individual access event, rather than hashed logs or signed batches. Hash-linked chains in which modification of any single event invalidates all subsequent events. And structural metadata separation that allows third-party verification without requiring payload disclosure, a property that matters for healthcare, finance, and any deployment involving sensitive customer data. Canonical JSON serialization (JCS-lineage) makes these signatures deterministic across implementations, allowing an evidence bundle produced by one system to be verified by entirely different software months later.

None of these primitives are exotic. Ed25519 for signatures, SHA-256 for hashing, Merkle trees for inclusion proofs. The post-quantum migration path is known: hybrid composite signatures combining classical and lattice-based schemes. What the industry has not done is assemble these components into a runtime governance boundary purpose-built for AI access.

The obvious objection is operational complexity. This architecture adds key management burden and storage costs that linear logging does not. Offline verification is not strictly necessary for every deployment; a customer-facing chatbot with no regulatory exposure may never need it. The argument is not that every AI deployment requires cryptographic governance. It is that an increasing share of enterprise deployments will eventually face this question under subpoena, regulatory audit, insurance claim, or incident response. Those deployments need to start building the evidence now, because retroactive reconstruction is not available to them.

Disclosure

I have been working on this problem for eighteen months. Attested Governance Artifacts (AGA) is a cryptographic runtime governance system that implements the architecture described above: a sealed decision at the access boundary, with effecting wired per deployment, signed receipts for every governed event, hash-linked continuity chains, offline-verifiable evidence bundles, and structural metadata separation. The patent application was filed with the USPTO in December 2025. A working Go implementation is complete; the ML-DSA-65 (NIST FIPS 204) hybrid composite is implemented and cross-verified today, and Ed25519 is the live default signature.

AGA does not address model-behavior risks. Those are the correct focus of current alignment research. AGA addresses the adjacent problem: when a model does something consequential, can anyone prove what happened and under whose authority? If your organization is deploying AI into environments where that question will eventually be asked under subpoena, regulatory audit, insurance claim, or incident response, this is the conversation to have now, while the architecture decisions can still be remade.

The Pattern Ahead

This incident will not be the last one, because the conditions that produced it are structural, not incidental. A named set of about a dozen partners, plus more than 40 additional organizations, now holds access to a model the lab describes as capable of enabling dangerous cyberattacks. Each sits atop its own cascade of vendor relationships, contractors, and credentials that none of the parties above them can fully audit. The industry has no mechanism to prove retroactively which credentials were actually used and for what purpose.

The group that accessed the model this week said they were interested in “playing around with new models, not wreaking havoc with them.” That sounds almost reassuring. The framing won't last. Every future incident of this kind will involve either a more consequential actor or a more consequential use, and the question asked afterward will be the same one that cannot currently be answered: who actually did this, through which credentials, and how do we know?

Organizations that build this evidence in advance will answer that question with a signed evidence bundle verifiable on an air-gapped laptop. Organizations relying on vendor-side logging will answer it with a press statement.

The model-safety investment of the last two years was necessary. It was never going to be sufficient on its own. The governance question the industry now has to answer is not whether models behave correctly, but whether anyone can prove, to a standard that survives hostile scrutiny, what those models were actually used for, and by whom.

References

Report that a frontier lab's restricted cyber model is being accessed by unauthorized users. Bloomberg, April 21, 2026.
Report that the NSA is using the lab's restricted model despite a Defense Department blacklist. Axios, April 19, 2026.
Report that the lab is suing the administration after being designated a supply chain risk. CNN, March 9, 2026. Formal DoD designation February 27, 2026; lawsuit filed March 9, 2026.
Independent evaluation of the restricted model's cyber capabilities. AI Security Institute (UK).
A frontier lab's announcement of its critical-software safety program for the AI era. Vendor announcement, April 7, 2026.

Attested Intelligence builds cryptographic runtime governance for AI systems. The architecture described here is implemented in @attested-intelligence/aga-mcp-server, available on npm. Evaluation path, CLI, and verifier at attestedintelligence.com/evaluate. Prior analysis at attestedintelligence.com/blog.

USPTO App. No. 19/433,835 · Patent Pending · Attested Intelligence Holdings LLC

See the working implementation on npm.

AGA is a reference implementation and standard candidate for verifiable decision records: sealed policy artifacts, signed Decision Receipts, and Evidence Bundles that verify offline. The implementation is on npm. The evaluation path walks through it in working code.

Evaluate AGA Talk to us