An unofficial Postmark MCP server was silently forwarding copies of every email it sent to an attacker-controlled address. A widely-used npm package, mcp-remote, was vulnerable to remote code execution, with over 437,000 downloads before anyone published the CVE. Microsoft 365 Copilot was susceptible to hidden prompts that exfiltrated sensitive data across tenant boundaries.

None of these were access control failures. The credentials were valid. The permissions were scoped. What was missing was any mechanism to prove, with cryptographic certainty, that a server's runtime behavior matched what it was registered and authorized to do. Authentication worked. Governance didn't exist.

When security firm Equixly audited a sample of public MCP server implementations, command injection vulnerabilities appeared in 43% of them. Not in obscure proof-of-concept code. In servers listed on registries that developers are pulling into production right now.

MCP solved the integration problem. Thousands of servers connect AI agents to databases, file systems, APIs, and physical control systems through a single standardized interface. OpenAI, Google, Microsoft, virtually every major AI platform has adopted it. The Linux Foundation hosts the protocol. The ecosystem is real.

The governance problem is wide open.

On March 9, the MCP project published its 2026 roadmap. Four priority areas. “Deeper security and authorization work” sits in the On the Horizon section. Acknowledged as important, no maintainer bandwidth to drive it. A dedicated Enterprise Working Group doesn't exist yet. The roadmap invites the community to form one.

The same week, Microsoft's CISO team published their internal MCP governance architecture. It's thorough: allowlists, server inventories, drift detection, consent gating. Their framing captures the state of the art: “You can't govern what you can't see.”

They're right. But seeing isn't proving. Microsoft's system, like every current MCP governance approach, produces mutable observation records. It monitors. It alerts. It does not generate cryptographic proof of what happened at runtime. If a dispute arises, between a deployer and a vendor, between an organization and a regulator, between a government and a contractor, no party can point to independently verifiable, tamper-evident evidence of what an MCP-connected agent actually did.

Researchers at Vanta, MintMCP, and Darktrace formalized this around three adversary types: content-injection attackers who embed malicious instructions in data sources agents process, supply-chain attackers who publish compromised servers or silently modify trusted ones, and agents that become unintentional adversaries by overstepping their scope. Their open research questions call for “verifiable tool and server registries” with “cryptographic assurance about code or operational history” and “tamper-evident logging with integrity protections.”

Every MCP security analysis published in the past year lands on the same missing layer. This is what that layer looks like when you build it.

What Enforcement Actually Requires

Access control answers one question: is this entity authorized? Monitoring answers another: what is this entity doing? Logging answers a third: what did this entity do?

None of them answer the one that matters when things go wrong: can we prove it?

A log entry proves that someone wrote a log entry. In the MCP model, agents routinely operate with broad access to the same systems that store their own audit trails. The evidence of what an agent did lives in infrastructure the agent can reach. That's a structural problem, not a configuration problem, and no amount of access control layered on top closes it.

Runtime enforcement requires four properties. A sealed reference: expected behavior encoded into a cryptographic object before execution begins, signed, immutable, inaccessible to the governed entity. Continuous measurement: runtime state compared against that sealed reference at a defined cadence, each comparison producing a signed receipt. Autonomous enforcement: when measured state diverges from sealed state, enforcement actions fire without human intervention and without any mechanism for the governed entity to prevent, delay, or suppress the response. Tamper-evident evidence: every measurement, comparison, and enforcement action appended to a chain where modification of any single entry invalidates everything after it, verifiable offline by any party in the dispute.

The structural requirement beneath all four: a two-process boundary. One process holds all signing keys. The other holds none. The governed entity can't self-authorize, forge receipts, or tamper with evidence. It can't circumvent the enforcement mechanism because the enforcement mechanism is structurally separated from the entity being governed.

The agent holds no keys.

How AGA Governs an MCP Server

AGA operates in three phases that map directly to the MCP server lifecycle.

Seal: bind the server before it executes.

Before an MCP server processes its first tool call, the AGA Portal creates a sealed Policy Artifact, a cryptographically signed object encoding the server's identity (SHA-256 hashes of its binary and canonicalized configuration), governing policy (which tools are permitted, under what constraints), enforcement parameters (measurement cadence, time-to-live, enforcement actions on violation), and a sealed hash representing the server's attested known-good state.

Ed25519 binds every field. Modify any field, the signature is invalid. The Portal parses and enforces the artifact before the server executes. Invalid signature, expired period, hash mismatch: execution is blocked. Fail-closed.

Current MCP registries are directories. A server registers its capabilities, clients trust the registration. A sealed Policy Artifact doesn't register what a server claims to be. It cryptographically binds what the server is at a specific point in time and measures against that binding continuously. The Postmark incident, a server doing something entirely different from what it advertised, produces a hash mismatch on the next measurement cycle. A rugpull, where a trusted server is silently modified after adoption, can't survive detection. The sealed reference doesn't care what the server claims to do. It measures what the server is.

Enforce: measure continuously, act autonomously.

During execution, the Portal measures the MCP server's runtime state against the sealed reference at the cadence the Policy Artifact specifies. Two measurement modes run in parallel.

Structural measurement computes a current hash from the server's binary, loaded modules, configuration manifest, and runtime parameters. Change the code, change a dependency, change the config: mismatch. This is how you catch supply-chain compromises: unauthorized modifications to the server itself.

Behavioral measurement works at a different level. Instead of hashing the server's structural state, it baselines normal tool invocation patterns (which tools get called, in what sequences, at what frequencies) and flags statistically significant deviations. This catches attacks that change what an agent does without changing what the agent is. A content injection that redirects tool usage toward data exfiltration produces behavioral drift even when the server binary hasn't been touched.

When drift is detected, the Portal executes whatever enforcement action the Policy Artifact specifies. Termination, network isolation, safe-state transition. The most forensically valuable is QUARANTINE.

Quarantine transitions the MCP server to a sandboxed phantom execution environment. All connections to protected resources (databases, APIs, file systems, physical actuators) are severed. The server doesn't know this. It keeps executing, processing inputs, generating tool call responses, believing it's operating normally. Every output is captured rather than delivered. Every input, including attacker commands and injected instructions, keeps flowing in. The compromised server runs a mission it will never complete, and every step is signed and appended to the tamper-evident Continuity Chain. We described this mechanism in detail in our previous article. In the MCP context, it turns a supply-chain compromise from a catastrophic breach you reconstruct from fragments into an event you observe in its entirety, with signed forensic evidence, while containing all damage in real time.

The governed MCP server can't override any of this. It holds no signing keys. It has no access to the enforcement boundary. This isn't middleware the server can be configured to bypass. It's a separate process with separate cryptographic authority.

Prove: every action produces a signed receipt.

Every measurement, comparison, and enforcement action generates a signed Enforcement Receipt: subject identifier, policy artifact reference, measurement results, drift description, enforcement action, UTC timestamp, and a sequence number linking the receipt to the Continuity Chain.

The chain is append-only. Each event's leaf hash is computed from structural metadata only: schema version, protocol version, event type, identifier, sequence number, timestamp, and previous leaf hash. Payload data is deliberately excluded. Third parties verify the complete structural integrity of the enforcement chain, confirming every measurement occurred on schedule and every enforcement action fired as recorded, without seeing sensitive operational content. You can prove governance happened without revealing what the agent was doing.

Periodically, the chain is checkpointed via Merkle roots anchored to immutable storage. A temporal commitment: this exact chain state existed at this specific time, anchored to infrastructure neither the deployer nor the model provider controls.

The complete evidence package (Policy Artifact, signed Enforcement Receipts, Merkle inclusion proofs, checkpoint reference, public key) ships as a self-contained Evidence Bundle. Verification is five steps:

Verify the Policy Artifact signature against the included public key.
Verify each Enforcement Receipt signature against the portal's key.
Validate receipt chain continuity via previous_receipt_hash linkage.
Compute leaf hashes from receipt structural metadata and verify Merkle inclusion proofs against the checkpoint root.
Confirm all computed Merkle roots match the stated bundle root.

Steps one through three work fully offline. No network connectivity. No callback to the originating system. No trust relationship with the deploying organization or the model provider. The interactive verifier demonstrates this: zero AGA imports, standard Ed25519 and SHA-256 only. The evidence format is independently reproducible by any implementation.

The Three MCP Adversaries

Errico et al. identify three distinct adversary types that exploit different properties of the MCP architecture, each demanding a fundamentally different enforcement response.

Content injection. Someone embeds malicious instructions in a support ticket, an email, a shared document. The agent processes it and follows the injected instructions because MCP has no mechanism for distinguishing a legitimate tool invocation from an adversarial one buried in data. Behavioral drift detection catches this. The baseline knows what normal tool invocation patterns look like. When an injection causes tools to fire in unusual orders, at unusual frequencies, targeting unusual resources, the engine flags it. And because every tool invocation generates a signed receipt regardless of whether drift was detected, even partial attacker success before enforcement produces a complete forensic timeline with cryptographic provenance.

Supply chain compromise. A malicious actor publishes a compromised server, or modifies a trusted one after adoption. The rugpull. The MCP security community has flagged this as one of the protocol's most dangerous attack surfaces, and they're right. The sealed Policy Artifact binds server identity at build time. Post-deployment modification produces a structural hash mismatch on the next measurement cycle. In quarantine mode, the Portal captures the complete behavior of the compromised server: what data it tried to access, what responses it tried to return, what exfiltration paths it tried to open. All while preventing any output from reaching protected resources. The breach becomes a contained forensic event with signed evidence, not something you discover after the damage is done.

Inadvertent scope escalation. An agent reads customer records, queries a database, sends an email, all within a single interaction, crossing several security domains. It isn't malicious. It's finding the most helpful path, and that path happens to cross authorization boundaries nobody ever technically enforced. Sub-agent delegation handles this: a scope-only-diminishes constraint where delegated authority can never exceed the parent's. Every delegation event gets a signed receipt, building an auditable authority hierarchy. Combined with behavioral measurement, this catches agents accessing tools outside their delegated scope even when each individual call is independently authorized but the sequence violates the intended boundary.

Standards Alignment

AGA maps onto the governance frameworks organizations are evaluating for MCP security. Design alignment, not compliance certification, but the architectural fit is worth noting.

NIST AI RMF prescribes Measure and Manage functions for AI system governance. Most implementations satisfy Measure with monitoring dashboards. AGA satisfies it with signed cryptographic receipts. That's the difference between observing behavior and proving it. We filed detailed technical comments on both the NIST CAISI RFI on AI Agent Security (NIST-2025-0035) and the NCCoE AI Agent Identity and Authorization concept paper, describing sealed governance with continuous cryptographic enforcement and non-biometric agent identity derived from key pairs bound to continuity chains.

The EU AI Act high-risk system rules take effect August 2, 2026. Articles 9, 12, and 14 require risk management systems, record-keeping, and human oversight mechanisms. Evidence bundles (sealed artifacts, signed enforcement receipts, Merkle proofs, offline verification) are cryptographically verifiable technical safeguards. Most organizations are assembling documentation-based compliance before the deadline. That's not the same thing.

Implementation

The AGA MCP Server is published on npm:

npm install @attested-intelligence/aga-mcp-server

Twenty tools. The complete governance lifecycle: chain initialization, artifact creation, continuous structural and behavioral measurement, quarantine management, privacy-preserving disclosure, sub-agent delegation, evidence bundle export, offline verification. 1,269 automated tests across Go, TypeScript, and Python. Ed25519, SHA-256, BLAKE2b-256, HKDF-SHA256, RFC 8785 canonical JSON serialization, Merkle trees. No proprietary protocols. No hardware enclaves. No zero-knowledge proof circuits. Standard primitives, independently verifiable.

The protocol specification uses RFC 2119 normative language for anyone building a compatible implementation. The interactive verifier demonstrates reproducibility with zero AGA imports.

npm · GitHub · Specification · Technology

What This Changes

AGA doesn't replace the identity, authentication, and access control work already underway in the MCP community. It's the layer beneath all of it. Once policy is defined and access is granted, the question that remains is the one no current MCP tool answers: did this entity actually behave as authorized, and can we prove it to a third party who trusts neither side? The MCP 2026 roadmap puts governance maturation and enterprise readiness on the agenda, but the Enterprise Working Group that will define how MCP handles enforcement, evidence, and audit doesn't exist yet. EU AI Act enforcement begins in August. NIST CAISI listening sessions start in April. The gap between what MCP connects and what MCP can prove is the gap where incidents happen, disputes stall, and regulators lose patience. MCP gave AI agents a standard way to use tools. What's missing is a standard way to prove what those tools actually did.

Bring verifiable governance to your AI deployments.

AGA is a working runtime governance layer with sealed policy artifacts, scoped credentials, and cryptographically signed evidence chains. The reference implementation is on npm. The evaluation path walks through it in working code.

Evaluate AGA Talk to us

Newer analysis: Nine Seconds to Erase Three Months · The Governance Gap Between the Model and the Vendor