Accountability for AI agents requires two things your audit log has never had to record: the chain of delegation that produced the action, and a measure of how close a human was to the decision.
01 — Two New Dimensions
AI agents add two structural dimensions to governance that existing audit infrastructure has never had to record: authority chains and human proximity.
Authority chains arise from agent orchestration. When an agent decides to invoke another agent, which invokes another, authority flows through a chain the system created at runtime. Who authorized the final action? Not the executor — the executor received authority. Tracing accountability requires following that chain: who originated the authority, who exercised judgment in the middle, who performed the action at the end.
Human proximity arises from the spectrum of human involvement. A single agent, in a single workflow, may perform actions a human explicitly requested, actions the agent decided were necessary to fulfill a goal, and actions triggered by another agent with no human in the loop at all. These are different governance realities. An accountability record that cannot distinguish between them cannot answer the questions governance actually asks.
Standard audit infrastructure has no concept of either dimension. It was built for a world with two kinds of actors: humans making decisions and deterministic code executing them. For humans, accountability is self-evident. For deterministic code, the code is the explanation. Agents occupy the space between. Their behavior is shaped by instructions, a model, and available tools. The same agent given the same input may reason differently depending on what it retrieves, which version of its configuration is active, and what other agents are in the workflow.
The gap is visible in any production audit log. The record loan.status: PENDING → APPROVED is identical whether a human underwriter typed the approval, an automated rule triggered above threshold, an agent acting on explicit user instruction made the change, or an autonomous agent processed a queued application with no human in the loop. Four fundamentally different governance realities. One indistinguishable record.
The data model for audit records has no architecture for delegation chains or a spectrum of human involvement. Closing the gap requires adding authority chains and human proximity as first-class elements of the governance record.
02 — What Accountability Requires
If existing audit infrastructure cannot answer governance questions about agent actions, the question becomes: what does a complete accountability record have to contain?
The foundational shift is in what the record is anchored to. Database audit logs record mutations — a field changed from value A to value B at a timestamp, attributed to an identity. This is necessary but insufficient. A mutation record answers what changed. A governance question asks what business operation caused the change, and who authorized it. These are different questions requiring different records.
A complete accountability record for an agent action contains five things. The business operation — not the database write, but the domain-level action: LoanApproved, PatientRecordViewed, DatasetExported. The authority chain — the full sequence of identities from the originator through any intermediaries to the executor, with each link’s role and the judgment it exercised. The human proximity classification — how close a human was to this specific decision. The agent configuration — the instructions, model version, and tool manifest the agent was running at the moment of execution, not today’s version. And the execution context — the session, device, and network environment in which the action occurred.
None of these are exotic requirements. Each answers a question that governance, compliance, and investigation workflows already ask. They are absent from most production systems because they were never required before. Deterministic code doesn’t delegate to other actors at runtime. Human decisions are inherently attributed. Only agents — actors that exercise judgment, delegate authority, and operate across a spectrum from fully directed to fully autonomous — require all five.
03 — The Provenance Principle
Decision provenance is the architectural property of a system that can answer: who authorized this action, through what chain of delegation, and how close was a human to the decision?
Two components produce this property together.
Authority chains trace the full path of accountability from the identity where authority originated, through any identities that exercised judgment, to the executor that performed the action. A single-link chain — a human performs an action directly — is the simple case. In agent workflows, the chain grows with each delegation step, recording who decided to involve whom and what judgment each intermediary exercised.
Human proximity classifies each action by how close a human was to the decision that produced it. Not the agent’s overall autonomy tier — its deployment posture, how much independence it has in general. Human proximity answers a more specific question for each individual action: was a human involved in this decision, and if so, did they initiate it or ratify an agent’s recommendation?
These two components are kept separate because they answer different questions. The authority chain answers who — the full accountability path. Human proximity answers how — the nature of human involvement in that specific decision. A complete governance record requires both. Neither is sufficient without the other.
The authority chain records the path of delegation. Human proximity records where on the spectrum from human judgment to autonomous action each specific decision fell. Together they produce an accountability record that existing audit infrastructure cannot generate.
The principle that makes human proximity tractable: the classification follows the human, not the agent topology. What matters is whether a human was involved in the decision — not how many agents the request passed through, not whether one agent asked another for approval. Agent-to-agent approval is a coordination fact, not a governance fact. The classification asks a single question for each action: was a human in the decision chain for this specific action, and how close?
04 — Human Proximity
The same agent, in the same workflow, can produce actions at different human proximity levels within a single session. This is expected behavior for any agent given a goal rather than a specific sequence of steps. It decides which sub-actions to take, when to pause for human input, and when to act on its own judgment. Each of those decisions produces a different classification.
| Level | Human involvement | Definition |
|---|---|---|
| DIRECTED | Right here | A human explicitly initiated and authorized this specific action. The human made this decision. The agent is the executor of a human choice. |
| CONFIRMED | One step back | An agent determined this action was appropriate and a human approved it before execution. The initiative was the agent’s; the human ratified the recommendation. |
| INFERRED | At the goal level | A human authorized a goal. The agent decided this specific action was necessary to achieve it, without explicit per-action approval. |
| DELEGATED | Several links back | Another agent — not a human — decided this action was needed. Authority may trace back to a human, but the immediate trigger was agent-to-agent delegation. |
| AUTONOMOUS | Not in the workflow | No human triggered this workflow. The agent acted on a standing deployment policy, a schedule, or a monitored condition. Authority derives from deployment configuration, not a person. |
The determining principle. The classification does not track approval mechanics, request routing, or agent coordination topology. It answers one question per action: was a human involved in the decision to perform this specific action, and if so, did the human initiate it or ratify an agent’s recommendation? If no human was involved, the classification distinguishes whether a human authorized the broader goal (INFERRED), another agent triggered the action (DELEGATED), or no human is in the workflow at all (AUTONOMOUS).
A concrete scenario. A user tells an agent: “Process this loan application.” The agent performs four actions:
| Action | Classification | Reason |
|---|---|---|
| ReadCustomerRecord | INFERRED | The agent decided this was needed. The user authorized the goal, not this specific read. |
| CheckCreditScore | INFERRED | Agent judgment within the authorized goal. |
| PullFinancialHistory | INFERRED | Same — the agent decided this step was warranted. |
| ApproveLoan | CONFIRMED | The agent determined approval was warranted, paused, and asked the user: “Should I approve this?” The user confirmed. |
The distinction between the last action and a scenario where the user had said “approve loan 711” directly matters. In the direct case, the classification is DIRECTED — the human initiated the specific action. In the scenario above it is CONFIRMED — the agent recommended, the human ratified. These are different accountability statements. “The underwriter decided to approve” is not the same as “the agent recommended approval and the underwriter confirmed.” Both involve a human. The governance record captures which.
05 — The Authority Chain
Every governance activity records accountability through an ordered sequence of identities — the authority chain — describing how authority flowed from origin to execution.
| Role | Definition |
|---|---|
| ORIGINATOR | The identity where authority originates. A human user who triggered the operation, or a deployment configuration that authorized autonomous action. |
| INTERMEDIARY | An identity that received authority and exercised judgment about how to proceed — deciding which actions to take or which identities to involve. A chain may have zero or many intermediaries. |
| EXECUTOR | The identity that performed the specific action recorded by this governance activity. |
The INTERMEDIARY role is where the most important governance questions live. The executor performed the action. The originator authorized the goal. The intermediary — an agent that decided to invoke another agent, or concluded that a specific sub-action was warranted — is where judgment was exercised and where accountability for decisions, not just actions, is assigned. In a loan processing workflow: the user is the originator, the loan-processing agent is the intermediary that decided a credit check was needed, and the credit-checking agent is the executor. The middle link carries the accountability for the judgment call.
Agent versioning. A governance record for an agent’s action is incomplete without the configuration the agent was running at the time. An agent’s behavior is shaped by its instructions, model version, and tool manifest — not by deterministic code. If the instructions change, the model is upgraded, or a new tool is added, the agent’s judgment changes. A governance record that references the agent identity without capturing the version is forensically incomplete: it records who acted, but not what that actor was configured to do.
Each agent version carries a snapshot of its full configuration at registration. Governance activities reference both the agent identity and the active version. An investigator examining an incident from three months ago retrieves the exact instructions and tool manifest that were active at the time of execution — not the current configuration.
06 — The Questions It Answers
The combined model — business operations linked to authority chains, human proximity classifications, and agent version snapshots — produces a governance narrative rather than an audit log. It answers the questions that governance, compliance, and incident investigation workflows actually ask.
| Question | Data source |
|---|---|
| What business operation occurred? | Governance activity — action type, resource, timestamp |
| What data changed as a result? | Entity mutations linked to the activity |
| Who executed the operation? | Authority chain → EXECUTOR |
| Where did authority originate? | Authority chain → ORIGINATOR |
| Which agents exercised judgment? | Authority chain → INTERMEDIARY entries |
| How close was a human to this decision? | Human proximity level |
| What configuration was the agent running? | Agent identity → version → configuration snapshot |
| From what device and network? | Execution context |
| What else happened in this session? | All activities sharing the execution context |
An investigation. An unexpected credit limit increase is detected. An investigator queries governance activities and identifies a CreditLimitUpdated activity with human proximity level INFERRED. Traversing the authority chain surfaces loan-processor-v2.3 as the intermediary that decided the increase was warranted while processing a user-initiated application. Retrieving the version snapshot shows the exact instructions active at the time. Examining the entity mutations confirms the specific field changes and their values.
This is what governance narrative means in practice: not a field-and-timestamp entry that tells you what changed, but a structured record that tells you what happened, who authorized it, through what chain of delegation, at what level of human involvement, and what the agent was configured to do when it acted.
07 — For Practitioners
Decision provenance does not require building parallel infrastructure from scratch. A structural equivalence exists between the governance model and distributed tracing systems — specifically the OpenTelemetry Protocol. Organizations already running OpenTelemetry are closer to this than they typically realize.
| Governance concept | OTLP primitive |
|---|---|
| Execution Context | Trace — the full lifecycle of a request or session |
| Governance Activity | Span — a unit of work with attributes and timestamps |
| Hierarchical Activities | Parent-child spans |
| Entity Mutations | Span events — field changes attached to a span |
| Authority Chain | Span hierarchy + attributes — each link produces its own span |
| Human Proximity | Span attribute on the executing span |
| Agent Provenance | Span attributes — agent identity, version, configuration reference |
Governance spans carry a semantic convention — a standardized set of attribute names that distinguish a governance activity from a diagnostic trace. Any span carrying these attributes routes to governance infrastructure; spans without them remain in the standard tracing pipeline.
The dual-path architecture. Tracing tolerates sampling and short retention. Governance requires completeness, immutability, and long-term retention. The OTLP Collector identifies governance spans by their semantic attributes and routes them to an append-only store with governance-specific indexes, while the same spans also flow to the standard tracing backend for operational visibility. Two paths, one instrumentation, one transport.
What stays separate. The Identity Registry — accounts, identities, agent versions, and delegation relationships — remains dedicated governance infrastructure. Identities have lifecycle semantics that tracing backends do not support: an identity that leaves the organization transitions to ARCHIVED, never to deleted. Governance records reference it indefinitely. Agent configuration snapshots are stored in the registry; governance spans reference the version identifier.
Where to start. Instrument at business operation boundaries, not at database write boundaries. A loan approval is one governance activity. The database writes it produces are entity mutations on that activity — not separate activities. Add the authority chain and human proximity attributes at the points where business operations begin. Every service already instrumented with OpenTelemetry has the transport in place. The work is identifying which operations matter for governance and attaching the semantic convention at those boundaries.
08 — For CXOs
Regulated industries have required accountability records for decades. The requirement has not changed. What has changed is that AI agents make the existing approach structurally insufficient. The data model was designed for humans and deterministic code — actors that do not delegate judgment at runtime or operate across a spectrum of human involvement. It has no architecture for what agents introduce.
Decision provenance is the infrastructure that makes agent expansion defensible — a structural property of the system, built in or absent.
Without it, every increase in agent autonomy is a leap of faith. There is no record that could demonstrate how that autonomy was exercised — only logs that record what changed, with no account of who authorized the change or how. Regulators, auditors, and incident investigators ask questions the system cannot answer. The organization’s only response is reconstruction and approximation.
With it, accountability is structural. An auditor asking “was a human involved in this credit approval decision?” gets a precise answer: the specific human proximity classification for that action, the full authority chain that produced it, and the agent configuration that was active. The organization can answer governance questions from its own records, at the level of precision those questions require.
The underlying principle is correctness-by-governance rather than correctness-by-construction. You cannot credibly extend what you cannot explain. Organizations that build decision provenance into their agent infrastructure will be able to demonstrate, from structural records, how their agents make decisions, when humans are involved, and what configuration produced each action. That demonstration is what earns the institutional trust required to extend agent autonomy further — and to defend it when it is questioned.
Governance that depends on policy documents and periodic reviews cannot keep up with agents that act continuously. Decision provenance produces governance records at the pace of agent execution. That is the only governance posture that scales.
Decision provenance is one pillar of the Agentic Runtime Governance framework. The Runtime Governance Checklist provides a structured assessment of your current architectural posture across all four pillars.