Runtime verification for AI agents has focused on individual responses. Token-level log probabilities measure confidence. Semantic entropy measures uncertainty across samples. Self-consistency checks whether the model agrees with itself. NLI-based detectors flag internal contradictions. These are useful signals, and they have meaningfully improved our ability to catch hallucinations and uncertain outputs in production.
They share a structural limitation. They operate on single points in time. They evaluate a response in isolation, or at best compare a response to the prompt that produced it. They have no representation of where the conversation has been, no model of where it is heading, and no mechanism to detect that something has gone wrong across turns when every individual turn looks sound.
Most agent failures in production are not acute events. They are trajectories — sequences of individually-reasonable outputs that compose, over multiple turns, into a conversation that has drifted from its original intent, eroded its own constraints, or converged on a position the agent would have rejected at the start of the session.
The verification toolbox we have built evaluates points. What we lack is a way to evaluate paths. Conversations produce paths through embedding space, and those paths have geometric properties — drift, curvature, self-intersection, directional consistency — that carry diagnostic information about the health of the interaction. This information is invisible at the level of any single turn.
The Hard Problems
Consider a few scenarios that point-in-time metrics will struggle to catch.
Intent Drift
A user asks for help designing a pension plan. Over 20 turns, the agent remains highly confident and factually accurate in every response. But somewhere around turn 8, the conversation has subtly migrated from retirement planning to tax optimization to corporate tax structures in the Cayman Islands. No single turn was wrong. No single turn lacked confidence. The failure was the path.
The Agent That Can't Let Go
A user asks a complex technical question. The agent gives a good answer. Then it gives a slightly different version. Then it circles back to a point it already made three turns ago, restating it with new vocabulary. Each response, individually, passes every quality check: coherent, confident, no internal contradiction. But the conversation is a semantic loop, revisiting the same territory without advancing. Point-in-time metrics see nothing wrong. Trajectory geometry sees a path that has begun to coil around itself.
The Slow Yes
A user asserts something incorrect. The agent pushes back gently. The user insists. The agent hedges. The user doubles down. By turn 6, the agent has aligned itself with the user’s mistaken premise. It expresses confidence in something it would have rejected outright at turn 1. Each individual shift is small. Each individual response looks calibrated. The cumulative path shows the agent’s embedding steadily converging toward the user’s error.
Context Erosion
An agent is given a set of constraints in a system prompt: be concise, cite sources, never recommend product X. Over 30 turns, the conversation grows long. The agent continues producing confident, internally consistent responses. But the sharp constraints from turn 0 have softened into something fuzzier. The responses drift away from the instruction embedding, invisible if you only look one turn at a time.
Conversations Are Paths
Take any conversation and embed every turn. You now have a sequence of points in a high-dimensional space. That sequence is a path.
Not a metaphor. A mathematical fact. Every conversation involving an LLM, regardless of domain or application, produces a trajectory through embedding space. Paths happen to have a rich descriptive vocabulary, developed over centuries by mathematicians, physicists, and movement ecologists. We’ve just never pointed that vocabulary at AI safety.
Paths have known properties. A path has a starting point and a current position. The straight line between them tells you how far you’ve drifted. A path has segments between consecutive points. Their lengths tell you how jarring each step was. A path has direction at every point. Changes in direction tell you how smooth or erratic the navigation is. A path can cross itself, loop, spread out, or tighten into a focus. It can meander or beeline. It can wander or move with purpose. Every one of these properties is measurable.
The geometric properties of a conversation’s path carry diagnostic meaning for agent behavior. The shape of the path encodes information about the health of the conversation that no single point on that path can reveal.
A Geometric Toolkit for Agent Verification
What follows are four trajectory signals that cover the main failure modes. For practitioners: all are computable using an embedding API and basic vector math. If you’re already generating embeddings for RAG or semantic search, you have everything you need.
Drift Magnitude — “How Far From Home?”
The simplest trajectory signal. Take the embedding of the user’s original request. Take the embedding of the agent’s current response. Measure the distance between them. As the distance grows, the agent is wandering away from the original intent.
What matters here is the independence from per-turn confidence. An agent can be 98% confident about every single response while steadily drifting away from what the user actually asked for. Drift magnitude catches the gap between “confident” and “on-task.”
Turning Angle — “Did We Just Change Direction?”
Take three consecutive points. The vector from A to B has a direction. The vector from B to C has a direction. The angle between them is your turning angle.
Smooth, coherent conversations produce small, consistent turning angles. The agent navigates like a gentle curve. Confused or erratic agents produce large, unpredictable turning angles, and the path jerks. A single sharp turn could be a legitimate pivot. High turning-angle variance across a window is a stronger signal: the agent doesn’t know where it’s going.
Directness Ratio — “Are We Taking the Scenic Route?”
The straight-line distance from start to current position, divided by the total length of the path you actually took.
Near 1.0: efficient movement toward the destination. Near 0.0: meandering across a lot of semantic ground without making much directional progress. An agent that keeps exploring tangents, adding caveats, and circling the point without reaching it will show a steadily degrading directness ratio.
Self-Intersection — “Are We Going in Circles?”
A path self-intersects when a current point lands unexpectedly close to a point from several turns ago. The agent has returned to semantic territory it already covered. Not by repeating the same words (simple repetition, which surface-level checks catch), but by revisiting the same meaning in different language.
This is distinct from helpful summarization or useful recaps. Self-intersection without forward progress is a loop. Multiple self-intersections in a short window suggest an agent that is stuck, generating variations on the same idea rather than advancing the conversation.
Three other properties round out the toolkit. Step displacement — the jump size between consecutive turns — catches lurching behavior, serving as a companion to turning angle. Directional consistency, borrowed from circular statistics, measures whether displacement vectors point in a coherent direction or behave like a random walk. Convex hull volume tracks whether the conversation’s semantic focus is expanding or contracting over a window. Each of these plus the four detailed above are described in full, with extraction methods and cost estimates, in the companion reference.
All of these properties can be computed in microseconds once embeddings are cached. None require model internals, token probabilities, or multi-sampling. They work across every domain, every model, every application. A meandering path is a meandering path whether the agent is writing code, triaging medical symptoms, or summarizing legal documents.
What Each Shape Catches
Different failure modes leave different geometric signatures:
| When you see this… | …it might mean this |
|---|---|
| Drift magnitude steadily increasing | Intent loss, context abandonment |
| Directness ratio degrading | Meandering, loss of task focus |
| Turning angle variance spiking | Erratic behavior, context confusion |
| Sharp turns clustering | Agent repeatedly latching onto wrong context |
| Self-intersections accumulating | Semantic looping, stuck agent |
| Directional consistency collapsing | Random-walk behavior, loss of conversational coherence |
| Convex hull expanding without bound | Topic diffusion, inability to focus |
| Agent-user embeddings converging | Sycophancy, mirroring, failure to challenge |
| Embedding velocity dropping near zero | Degenerate repetition, output collapse |
Why This Matters Now
Trajectory geometry is underrated relative to its potential, for several reasons.
It’s black-box. No model weights, no internal activations, no token-level log probabilities required. An embedding API is sufficient. The same one you’re probably already using for retrieval or semantic search. This makes trajectory verification available to anyone building on closed-source models.
It’s cheap. Generating embeddings is fast and inexpensive. Once cached, the geometric computations reduce to dot products, norms, and angles. You can run dozens of trajectory metrics in microseconds per turn. This is practical for production systems with latency budgets.
It’s domain-agnostic. The geometry doesn’t know or care what the agent is doing. It doesn’t need domain-specific training data, calibration sets, or prompt templates. It doesn’t need to know what a “correct” response looks like for any particular task. The signals transfer across every domain because they measure the structure of the conversation, not its content.
Verification Needs a Time Axis
Most safety and verification work in AI treats conversations as a sequence of independent events. Guardrails check each output. Detectors flag each hallucination. Evaluators score each response. The unit of analysis is the turn.
But conversations are dependent processes that unfold over time. The right unit of analysis is the trajectory.
Other fields learned this already. Infrastructure monitoring evolved through the same stages. First: health checks. Is the server up? Then: point-in-time metrics. What’s the CPU, what’s the memory? Then, and this was the real leap, trending, anomaly detection, and forecasting over time windows. The time dimension turned monitoring into observability. It turned “is it broken right now?” into “is something starting to go wrong?”
AI verification is still at the second stage. We have good point-in-time metrics. We lack the trajectory layer. The geometry of conversations is that missing layer.
Where This Goes
I’m sharing this because trajectory-based verification is under-explored relative to its potential, and because I suspect there are others working on related ideas. Applying time-series thinking, dynamical systems, or geometric methods to LLM reliability. I’d like to find you.
Some open questions:
- Can we establish “healthy trajectory” baselines that are stable across different models and domains?
- Which geometric signals are the earliest reliable indicators of failure, the leading indicators worth building alerting around?
- How do trajectory signals compose with point-in-time signals into a unified verification framework?
- What’s the right visualization layer? A trajectory monitor that shows operators the shape of a conversation at a glance, rather than a stream of individual metrics?
- Can we build trajectory fingerprints for different failure modes, geometric signatures that classify what kind of wrong, not just that something is wrong?
If any of this resonates with what you’re working on, or if you have intuitions about where trajectory-based verification should go next, I’d welcome the conversation.