Skip to main content
AI agents governed through cryptographic enforcement boundary with sealed policy artifacts

Who Controls the Model at Runtime?

The Pentagon called runtime model control a national-security risk.
Forty-eight hours later, Anthropic accidentally proved why.

Attested Intelligence|March 28, 2026|8 min read

1. The Court Filing

In its March 18 opposition brief filed in the Northern District of California, the Department of Justice argued that Anthropic's continued access to Department of War AI infrastructure posed an “unacceptable risk to national security.” The specific concern: the AI vendor could “attempt to disable its technology or preemptively alter the behavior of its model either before or during ongoing war-fighting operations.”[1]

Six days later, at the March 24 hearing in San Francisco, U.S. District Judge Rita Lin called the Pentagon's treatment of Anthropic “troubling” and said the government's actions looked like “an attempt to cripple” the company.[2] She questioned whether the DoW was punishing Anthropic for refusing to allow unrestricted military use of its AI technology.

The government's concern and the judge's skepticism frame the same architectural question from opposite sides. The DoW argues it cannot accept a vendor that retains the ability to alter model behavior. The judge questions whether the government's response is proportionate. Neither side proposes a mechanism that would make the question of runtime control independently verifiable.

Forty-eight hours after that hearing, the same company at the center of the case demonstrated why the question matters more than either answer.

2. The Leak

On March 26, Fortune reported that Anthropic is testing a new AI model it describes as “by far the most powerful AI model we've ever developed.”[3] The model, internally called Mythos, was not announced through a press release. Its existence was discovered in a publicly accessible, unsecured data cache containing approximately 3,000 unpublished assets, exposed through a misconfiguration in the company's content management system. Anthropic attributed the exposure to “human error.”

The leaked draft describes the model as “currently far ahead of any other AI model in cyber capabilities” and warns that it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.” Anthropic confirmed the model exists and said it is being tested with early access customers, with the initial release focused on cybersecurity defense organizations, specifically because its offensive capabilities warrant caution.[3]

The leak matters not because of what it reveals about one company's operational security. It matters because of what it reveals about the governance model the entire industry relies on.

3. The Control Dispute

The court case lays out both sides of the control question with unusual clarity. The DoW wants to use Anthropic's technology for “all lawful purposes,” with no vendor-imposed restrictions on how models are deployed in military operations. Anthropic wants contractual red lines: no use for fully autonomous weapons and no mass surveillance of Americans.[4] The DOJ argues that these restrictions give a private company an unacceptable degree of influence over military decision-making. Anthropic argues that it cannot operationally control models once deployed in classified environments.[2]

The capability trajectory makes this urgent. In February 2026, OpenAI released GPT-5.3-Codex, the first model it classified as “high capability” for cybersecurity tasks under its own Preparedness Framework, and the first it had directly trained to identify software vulnerabilities.[5] Anthropic's recent models have already demonstrated an ability to surface previously unknown vulnerabilities in production codebases.[3] Mythos is described as a further step change beyond both. Each generation of model widens the governance gap. The controls that were arguably sufficient for systems that summarize documents are structurally insufficient for systems that can autonomously discover and chain zero-day exploits across production infrastructure.

PartyPositionArchitectural Gap
Pentagon / DOJRemove vendor restrictions; unrestricted access to AI models for all lawful military purposesUnrestricted access without independent verification means the operator cannot prove how models were used. Logs are claims, not proof.
AnthropicContractual red lines: no autonomous weapons, no mass surveillance of AmericansContracts operate in courtrooms, not compute environments. Between instruction and execution, no contractual clause interposes.
Neither sideNo proposal for independently verifiable runtime controlBoth positions assume trust in one party. Neither produces cryptographic proof that constraints were enforced.

4. What Neither Side Addresses

The Pentagon assumes that removing vendor restrictions and gaining unrestricted access solves the control problem. It does not. Unrestricted access without independent verification means the operator cannot prove how models were used. When an autonomous system operating in a classified environment takes an unauthorized action, and the only evidence is a log entry the system wrote about itself, the evidence is not proof. It is a claim. A malfunctioning model can generate actions that the log accurately records but that no policy authorized. The log tells you what happened. It does not tell you whether what happened was permitted.

Anthropic assumes that contractual red lines and safety training solve the governance problem. They do not. A contract is a legal instrument. It operates in courtrooms, not in compute environments. Between the moment a model receives an instruction and the moment it executes an action, no contractual clause interposes. Training is a tendency, not a guarantee. Contracts do not execute at runtime.

The dispute is about trust. The architecture should make trust unnecessary.

5. Four Properties

What would close this gap is not a better contract or a more permissive access policy. It is a different class of architecture. Four properties, each building on the previous.

Seal: lock the authorized scope before operation begins.

Sealed pre-commitment. Before a model begins operating, its authorized scope is cryptographically committed into an immutable artifact: which tools it can invoke, which operations are permitted, what rate limits apply, what temporal bounds constrain its authority. The artifact is signed. Modification breaks the signature. The constraints are no longer advisory.

Enforce: a separate process mediates every action.

Independent enforcement. A separate process, running in a different execution context and holding the signing keys the governed subject cannot access, mediates every action. Before a tool call executes, before data is accessed, before an output is delivered, the independent process evaluates the action against the sealed constraints and makes the enforcement decision. The governed subject executes. The independent process enforces and records. That boundary cannot be collapsed by the vendor, the operator, or the model itself.

Record: every decision produces a signed receipt.

Signed receipts. Every action, whether permitted, denied, or flagged, produces a cryptographic receipt containing the action, the policy reference, the decision, the timestamp, and the previous receipt's hash. The receipt chain is append-only and linked by structural metadata hashes. Modification of any receipt invalidates every subsequent link. The governed subject cannot forge a receipt because it does not hold the signing keys.

Verify: portable proof on an air-gapped machine.

Offline verification. At the end of an operation, an evidence bundle containing the sealed constraints, all signed receipts, and Merkle inclusion proofs enables any third party to verify the complete governance history. No network access required. No trust in the vendor. No trust in the operator. No trust in the platform. Standard cryptographic verification on an air-gapped machine.

6. From Trust to Verification

These four properties transform the control question from “who do we trust?” to “what can we verify?” The DoW would not need to trust Anthropic's red lines if an independent enforcement boundary prevented unauthorized actions and produced cryptographic proof of every decision. Anthropic would not need to trust the Pentagon's assurances about intended use if sealed artifacts constrained model behavior at runtime and evidence bundles proved the constraints were enforced.

The question the court will answer is a legal question: who has authority, what process was followed, what rights were violated. The question the architecture must answer is different.

Can anyone, after the fact, on a disconnected machine, with no access to the original infrastructure, verify that an autonomous system operated within its authorized parameters?

The DOJ articulated the threat. The Mythos leak demonstrated the operational reality. The answer is not better vendor security or more permissive government access. The answer is governance architecture that does not require trust in any single party, because it produces cryptographic proof that is independently verifiable by all of them.

References

  1. Department of Justice opposition brief, Anthropic PBC v. U.S. Department of War, Case No. 3:26-cv-01996, Northern District of California. Filed March 18, 2026.
  2. Axios. “Judge questions Pentagon's ‘troubling’ Anthropic actions.” March 24, 2026.
  3. Fortune. “Exclusive: Anthropic acknowledges testing new AI model representing ‘step change’ in capabilities, after accidental data leak reveals its existence.” March 26, 2026.
  4. Associated Press / KSAT. “Anthropic and Pentagon head to court as AI firm seeks end to ‘stigmatizing’ supply chain risk label.” March 24, 2026.
  5. Fortune. “OpenAI GPT-5.3-Codex warns of unprecedented cybersecurity risks.” February 2026.
SharePost

Attested Intelligence Holdings LLC

© 2026 Attested Intelligence™

Cryptographic runtime enforcement for AI systems.