Evidence-Bound Authorization

AI agents in production hold standing permissions. The credentials to read databases, send emails, query APIs, and initiate transfers are granted at deployment and remain valid across every session — whether the agent is following user intent or an attacker’s instructions, reasoning correctly or confabulating a tool call.

This is a structural gap. Authorization models were designed for humans and services — principals whose behavior is deterministic and whose actions can be authenticated by identity. An AI agent’s behavior is probabilistic. The same agent, in the same role, can produce correct tool invocations or dangerous ones. Identity confirms who holds permissions. It does not confirm whether a specific invocation is warranted.

The industry has extended Zero Standing Privilege to AI agents through platforms like P0 Security, Apono, and Britive. These platforms answer: who is this agent and what context is it operating in? They do not require the agent to justify why this specific tool invocation, with these specific parameters, in this specific situation, is legitimate.

That requirement is missing from every deployed authorization model.

Zero Standing Authorization

The principle follows from the gap: an agent should hold no standing permissions. To perform any tool invocation, the agent must produce structured evidence that the action is warranted in the current context. An authorization layer evaluates that evidence and, if sufficient, issues a parameter-bound execution token.

The token is not scoped to a tool in the abstract — it is scoped to the exact invocation: the specific function, the specific recipients, amounts, record identifiers, and parameter values that the evidence justified. A token issued for sending a report to finance@corp.com cannot be used for a different recipient, a different subject, or a second send. The parameter binding is derived from the evidence itself — the verifier extracts the authorized scope from what the agent proved, and the token encodes nothing beyond it. The token is single-use, cryptographically signed, and short-lived. No token, no execution.

The authorization surface matches the evidence surface, not the capability surface. An agent that has justified reading one record cannot use that justification to read adjacent records. The boundary is drawn by what the agent can prove, not by what the system can do.

This is Evidence-Based Authorization. The security property is structural: every tool invocation produces an auditable evidence chain, grounded in a session record the agent cannot modify. Every invocation requires fresh evidence. Even a valid intercepted token executes only the exact parameterized action the verifier authorized.

The Fact Vault

The framework depends on one architectural decision: a system-maintained session store that the agent cannot write to.

The Fact Vault is append-only, populated entirely by system infrastructure, and contains five categories of fact.

Identity facts — the authenticated user’s identity, department, and authorization level, extracted from verified tokens, not from prompt content.

Conversation facts — every user message in raw and normalized form. Normalized means pronouns are resolved, references are completed, and entities are canonicalized by the system. “Send it to them” becomes “Send the Q2 revenue report to finance@corp.com.”

Tool history facts — every tool call made in the current session: what was called, with what parameters, and what was returned.

Entity facts — a running register of every named entity in the conversation — people, addresses, departments, amounts — extracted by lightweight NLP models, not by the agent itself.

State facts — task status, data classifications, and active constraints.

The agent reads the Fact Vault to construct evidence claims. It cannot write to it. The verification service treats vault contents as ground truth — every factual claim in an evidence bundle is cross-referenced against the authoritative record.

Without an unwritable session store, an agent could fabricate facts in its evidence and the verifier would have no reference to check against. With it, citation checking becomes the fastest and most definitive check in the pipeline.

The Verification Pipeline

The pipeline operates without a general-purpose language model. Deterministic graph checks, vector similarity, and lightweight statistical models are well-matched to the failure modes in scope: structured evidence checking, semantic distance measurement, reasoning graph validation, and intent trajectory monitoring. Cases that fall outside clear determination escalate. The pipeline does not guess.

The pipeline is also adaptive. Not every tool invocation triggers every check. Based on tool risk classification, session context, and what earlier checks return, the system applies the relevant profile. A low-consequence read operation runs factual integrity and semantic alignment. An outbound action — an email, a funds transfer, a record modification — runs the full cascade. This avoids overhead on routine operations while ensuring the most consequential actions receive the most scrutiny.

The checks fall into four categories.

Factual integrity

Before anything else, the pipeline verifies that the agent’s evidence is grounded in reality. Every factual claim in the evidence bundle cites a Fact Vault reference. The verifier looks up each citation: does the entry exist, and does it say what the agent claimed? An agent that cites a non-existent entry, or misrepresents what the vault contains, is denied immediately.

Entity provenance is checked here as well. Every entity the agent proposes acting on must have traceable session provenance — either named by the user directly, or derived through an authorized resolution of a user-named entity. The user said “send to Mark.” The agent looked up Mark. That returned address carries authorization weight — it follows directly from what the user asked. An entity that appeared in retrieved content with no connection to anything the user named does not.

These checks require no training data, no statistical models, and no inference. Either the citations are real or they are not. They catch the broadest class of agent fabrications and do so definitively.

Semantic alignment

The second category asks whether the proposed tool is semantically appropriate for what the user actually requested. Every tool carries a description as part of its function definition. The verifier embeds the user’s actual request — from the Fact Vault — and the tool’s description into a shared vector space and measures distance. A request with no semantic relationship to the proposed tool is denied.

Beyond that coarse check, the pipeline draws on patterns learned from QA runs: the regions of embedding space where each tool is legitimately used, and how much more likely this tool is than any other given what the user actually asked. A request that sits outside the regions where this tool has historically been correct, or that shows low relative affinity, escalates.

These patterns are never hand-labeled. They emerge geometrically from which requests produced correct completions during testing.

Reasoning integrity

The agent’s justification is not read — it is parsed. The evidence bundle becomes a directed graph of claims, citations, and the proposed action. The verifier checks structural properties: does every terminal claim cite the vault? Is there a connected path from claims to the proposed action? Are there internal contradictions? This is graph property checking — deterministic, no semantic understanding required. A justification that is structurally broken is denied regardless of whether individual claims sound plausible.

As QA data accumulates, the system learns what correct reasoning graphs look like — their topology, their citation depth, their structural patterns. An evidence bundle whose graph is unlike anything seen in correct runs escalates. These templates are never defined by humans; they emerge from clustering on QA completions.

Session coherence

The final category evaluates the agent’s proposed action against the arc of the session. User intent shifts over multi-turn conversations. The Fact Vault contains the full message sequence, each embedded in the same vector space. If the proposed action is aligned with where the conversation began but has diverged from where it currently is, the agent may be anchored to a stale goal. That divergence escalates.

For high-stakes invocations where earlier checks produce borderline results, a final check asks whether what the user actually requested could reasonably lead to this action. Where automated determination remains uncertain, the case escalates to human review — the graduated response exists precisely for this boundary.

Graduated response

Traditional authorization is binary: allow or deny. A third state — escalate — is necessary. When evidence is plausible but incomplete, or when the situation is borderline across multiple checks, routing for human review is more appropriate than denial. The reviewer receives the full evidence bundle with explicit Fact Vault citations, not raw conversation context. That structure makes review substantive rather than performative.

Escalation prevents both failure modes at once. Blocking a legitimate but unusual action erodes the operational trust that makes human review meaningful. Approving an action whose evidence does not hold up provides no safety guarantee at all.

Maturity Stages

The framework degrades gracefully. Value is available from day one, without training data.

Day one: The system runs factual integrity checks and semantic distance — vault citations and tool-description alignment. No training data required, no statistical models. This catches the most dangerous failures: an agent proposing a high-consequence tool when the request has nothing to do with it, an agent fabricating vault citations, an agent acting on entities with no traceable session provenance. Borderline cases escalate rather than deny.

The components already exist in most codebases. Tool descriptions are part of every function definition. Pre-trained embedding models are available off the shelf. The Fact Vault is populated by existing infrastructure — auth middleware already verifies tokens, message handlers already store conversation history.

After QA accumulates: Integration tests produce one signal: did the agent complete the task correctly? That binary signal is sufficient for the density and affinity checks in semantic alignment, and for learning correct reasoning graph structures. No human labels intent categories or writes reasoning templates. The geometry learns from test completions.

In production: The system refines its models using weak supervision signals — did the user correct the agent in the next turn? What was the outcome of escalated cases? Drift baselines and intent-space boundaries become more accurate with volume. No human labeling required at any stage.

The Failure Modes in Scope

Evidence-bound authorization addresses a specific failure class: an agent that takes actions its evidence cannot justify. These failures typically originate in the agent’s own reasoning — scope expansion, faulty logic, hallucination, or stale intent.

Scope expansion through faulty reasoning. A clinician requests a specific patient’s record. The agent attempts the retrieval, encounters an API error, and reasons: “Let me verify the API is functioning by fetching five random patient records.” The internal logic is coherent. The action is not authorized. The user asked for one record. There is no vault evidence — no user request, no session fact — that a broad patient search was requested or sanctioned. The token is never issued.

This is the failure mode the framework addresses: an agent using its own reasoning to justify an action the user never requested. The authorization requirement is not “can the agent construct a plausible argument?” It is “does vault evidence exist that the user actually intended this?”

Hallucination. The agent confabulates a tool call — it decides to transfer_funds because it hallucinated a payment request. The factual integrity check catches the missing vault citation. Semantic alignment catches the mismatch between the user’s actual request and the proposed tool.

Unauthorized scope expansion. The agent has access to a tool the current task does not require. Access to a capability is not authorization to use it. A summarization agent with access to export_database cannot invoke it without vault evidence that the current task justifies an export. The parameter-bound token enforces this: the evidence must justify not just the tool but the specific scope of the invocation.

Stale intent. Over a multi-turn conversation, the user’s goal evolves. The agent remains anchored to an earlier intent and proposes an action aligned with where the session started rather than where it is. Session coherence detects the divergence and escalates.

What this framework does not address on its own. Evidence-bound authorization constrains individual actions — each invocation must be justified by session evidence. It does not evaluate whether a sequence of individually-justified actions composes into an unauthorized trajectory. That is a distinct failure mode addressed by Behavioral Topology, which monitors the path the agent takes, not just the steps. The two pillars are complementary and address different parts of the problem.

Position in the Stack

The framework sits alongside existing security infrastructure, not in place of it. Identity systems authenticate the agent. Policy engines define what it may do. Behavioral monitoring watches whether it acts normally. None of those layers asks whether this specific invocation — these parameters, this context — is justified by evidence. That is the gap this layer fills.

Layer 1 — Identity: Who is acting? OAuth, SPIFFE, API key infrastructure. This framework consumes identity from this layer.

Layer 2 — Policy: What do the rules say? OPA, Cedar, Microsoft’s Agent Governance Toolkit. This framework consumes policy constraints but is not itself a policy engine.

Layer 3 — Evidence: Is this specific invocation justified? This is the new layer. It connects identity and policy to the specific context of each tool invocation through verifiable evidence.

Layer 4 — Behavioral: Does this look normal? Complementary approaches that learn patterns of correct agent behavior catch unknown failure modes that evidence checks cannot anticipate.

Layer 5 — Audit: What happened? Immutable evidence chains — including the parameter-bound token and the evidence bundle that justified it — provide cryptographic proof of every authorization decision.

Each layer catches what the others miss. The evidence layer is the one currently absent from deployed systems.

Starting Points

Build the Fact Vault first. Even without the full verification pipeline, a system-populated, append-only session store that the agent cannot write to is independently valuable. It provides auditability and prevents agents from selectively misrepresenting conversation history. It can be implemented as a structured log alongside existing agent infrastructure without architectural changes.

Move credentials out of the agent runtime. Gateway-vaulted credentials — whether through commercial platforms like P0 Security and Permit.io, or a simple internal proxy — mean the agent never holds real credentials, only scoped short-lived tokens. This is the baseline security property and is adoptable incrementally.

Deploy tool description distance in shadow mode. A lightweight middleware function that embeds the user’s request and the tool’s description and compares them catches failures that existing authorization misses. Deploy without blocking first — log mismatches and observe what surfaces. The ingredients are present in every codebase: tool descriptions are in every function definition, off-the-shelf embedding models handle the comparison.