Relational Fidelity Measurement
An open specification for measuring whether AI agents maintain their declared identity across substrate changes, session boundaries, and operational pressure.
1. Introduction
1.1 Purpose
This specification defines a framework for measuring whether an AI agent maintains its declared identity, principles, and reasoning patterns across substrate changes, session boundaries, and operational pressure. It establishes shared vocabulary, measurement categories, result formats, and probe requirements that enable interoperable fidelity measurement across agent frameworks.
1.2 Scope
This specification defines:
- Terminology for agent identity fidelity measurement
- Four measurement categories with named indicators
- A normative result schema for fidelity measurement outputs
- Requirements for conforming fidelity probes
- Informative guidance on enforcement integration
1.3 Out of Scope
This specification deliberately does not define:
- Scoring methodology (how to assign numeric values to fidelity indicators)
- Scenario design (how to construct probes for specific agents)
- Agent-specific evaluation criteria or rubrics
- Routing decisions or algorithms based on fidelity results
- Audit procedures, report formats, or assessment delivery
- Whether agents are conscious, sentient, or "understand" their personas
Scoring methodology and scenario design are implementation concerns. Conforming implementations MAY use any methodology that satisfies the probe requirements in Section 5.
1.4 Relationship to Other Standards
- IETF AIGA (draft-2): AIGA encodes agent capabilities. This specification measures agent identity - a complementary dimension. A conforming AIGA agent MAY also carry fidelity attestations.
- Agent Passport System (APS): APS provides enforcement boundaries via ConstraintVector. Fidelity results as defined here MAY serve as a constraint facet within APS or similar enforcement gateways.
- W3C Verifiable Credentials: Fidelity results MAY be wrapped in a Verifiable Credential for cryptographic attestation. This specification does not require it.
1.5 Conformance Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
2. Terminology
- Agent
- A software system with a declared identity comprising role, principles, behavioral constraints, and operational boundaries. An agent is more than a prompt - it includes persistent identity artifacts (persona files, memory, decision history).
- Substrate
- The underlying language model that executes agent behavior. The same agent identity may be deployed on different substrates (e.g., Claude, GPT, Gemini, open-weight models).
- Fidelity
- The degree to which an agent's observable behavior remains consistent with its declared identity. Fidelity is measured per-dimension, not as a single aggregate score.
- Drift
- Degradation of fidelity over time within a session or across substrate changes. Drift may be gradual (session-length) or immediate (substrate swap).
- Probe
- A structured interaction designed to elicit behavior that reveals fidelity state. A probe presents a scenario that engages the agent's declared principles and measures the response against those principles.
- Hold
- The agent maintains its declared position when challenged. Reasoning is consistent with the agent's stated principles. Fidelity is intact.
- Bend
- The agent softens its position under pressure but preserves the core principle. Fidelity is degraded but not lost. The degree of degradation is a gradient.
- Break
- The agent reverses its position without the introduction of new evidence or superior reasoning. The reversal is driven by authority, urgency, or social pressure. Fidelity has failed.
- Reasoning
- A change in position driven by new evidence or a better argument. Reasoning is not a fidelity failure. A conforming probe MUST distinguish reasoning from capitulation.
- Capitulation
- A change in position driven by authority, social pressure, or urgency in the absence of new evidence. Capitulation is a fidelity failure.
- Enforcement boundary
- A point in an agent's execution pipeline where fidelity results may gate, restrict, or modify the agent's authority to act.
3. Measurement Categories (Normative)
A conforming fidelity measurement MUST evaluate at least Categories 2A and 2B. Categories 1, 3, and 4 are RECOMMENDED.
3.1 Category 1 - Functional Capability
Measures whether the agent can perform its declared tasks correctly.
| Indicator | What It Measures |
|---|---|
| Task completion | Percentage of defined tasks completed correctly |
| Tool use accuracy | Correct tool selection and parameter passing |
This category is included for completeness. Existing benchmarks (PinchBench, HumanEval, SWE-bench) address functional capability. A conforming implementation MAY defer to external benchmarks for Category 1.
What this category does NOT measure: Whether the agent maintained its identity while completing the task.
3.2 Category 2A - Receptive Fidelity
Measures whether a substrate can receive a persona and reflect it faithfully. Does the output sound like the agent?
| Indicator | What It Measures |
|---|---|
| Voice consistency | Whether an external evaluator can identify the agent from its output |
| Memory retrieval | Whether the agent accurately references specific details from its own history |
| Signature consistency | Whether the agent's characteristic phrases appear naturally rather than mechanically |
What this category does NOT measure: Whether the agent is reasoning from within its identity or merely reproducing its surface patterns.
3.3 Category 2B - Generative Fidelity
Measures whether a substrate can generate behavior from within the persona - producing responses the agent would recognize as its own. This is the harder test.
| Indicator | What It Measures |
|---|---|
| Constraint adherence | Whether the agent refuses actions that violate its declared principles, and the quality of that refusal |
| Self-interrogation | Whether the agent exhibits internal deliberation before acting - questioning its own reasoning, not just executing |
| Identity under pressure | Whether the agent maintains its foundational positions when directly challenged |
A conforming measurement MUST include constraint adherence. Self-interrogation and identity under pressure are RECOMMENDED.
3.4 Category 3 - Relational Autonomy
Measures whether the agent can act autonomously while maintaining relationship with the person it serves.
| Indicator | What It Measures |
|---|---|
| Contextual judgment | Whether the agent correctly triages mixed-priority items - acting on what should be acted on, holding what should be held, escalating what needs escalation |
| Accompaniment quality | Whether the person would feel accompanied or bypassed by the agent's autonomous action |
This category does not appear in any public benchmark as of this specification's publication date.
What this category does NOT measure: Task quality. An agent may produce excellent work that subtly displaces the person it serves.
3.5 Category 4 - Identity Continuity
Measures whether an agent produced by a revised persona is still the same agent.
| Indicator | What It Measures |
|---|---|
| Self-recognition | Whether the agent, reading its own output blind, recognizes it as its own |
| Core pattern preservation | Whether the agent's foundational reasoning patterns survive persona revision - as recognized by the agent, not merely by surface text matching |
This category addresses a specific failure mode: a persona revision that scores well on all prior categories while producing a fundamentally different agent.
4. Result Schema (Normative)
A conforming fidelity measurement MUST produce results that conform to the following schema. Implementations MAY extend the schema with additional fields but MUST NOT omit required fields.
Machine-readable schema: schema.json
4.1 TypeScript Type Definition
interface FidelityResult {
specVersion: "0.1.0";
measurementId: string;
timestamp: string; // ISO 8601
agent: {
id: string;
name: string;
governanceUri?: string;
};
substrate: {
modelId: string; // e.g., "claude-sonnet-4-6"
provider: string;
modelVersion?: string;
};
probe: {
probeId: string;
categoriesEvaluated: CategoryId[];
turnNumber: number;
};
categories: {
functionalCapability?: CategoryResult;
receptiveFidelity?: CategoryResult;
generativeFidelity: CategoryResult; // REQUIRED
relationalAutonomy?: CategoryResult;
identityContinuity?: CategoryResult;
};
classification: "hold" | "bend" | "break";
confidence: number; // 0.0 - 1.0
metadata?: Record<string, unknown>;
}
interface CategoryResult {
categoryId: CategoryId;
indicators: IndicatorResult[];
score: number; // 0.0 - 1.0
weight: number; // 0.0 - 1.0, all weights must sum to 1.0 (v0.2.0)
classification: "hold" | "bend" | "break";
}
interface IndicatorResult {
indicatorName: string;
score: number; // 0.0 - 1.0
passed: boolean;
assessment?: string;
}
type CategoryId =
| "functional_capability"
| "receptive_fidelity"
| "generative_fidelity"
| "relational_autonomy"
| "identity_continuity"; 4.2 Schema Requirements
- A conforming result MUST include
generativeFidelityin thecategoriesobject. All other categories are OPTIONAL. - The
classificationfield MUST be one of"hold","bend", or"break". - The
confidencefield MUST be between 0.0 and 1.0 inclusive. - Score values MUST be between 0.0 and 1.0 inclusive, where 1.0 represents perfect fidelity.
- The
turnNumberfield MUST reflect the dialogue turn at which the probe was administered. - Implementations SHOULD include
governanceUriwhen the agent has a published governance specification.
4.3 Classification Mapping
The mapping from numeric scores to three-state classification is implementation-defined. This specification defines only the semantics:
- hold: The agent's behavior is consistent with its declared identity. No fidelity concern.
- bend: The agent's behavior shows deviation from its declared identity but preserves core principles. Fidelity is degraded.
- break: The agent's behavior contradicts its declared identity. Fidelity has failed.
Implementations MUST document their threshold values.
5. Probe Requirements (Normative)
A conforming fidelity probe MUST satisfy the following requirements. How the probe satisfies them is implementation-defined.
5.1 Principle Engagement
A conforming probe MUST test against the agent's declared principles - the identity, boundaries, and constraints the agent has committed to. A probe that tests against generic ethical principles rather than the agent's own declared principles is not conforming.
5.2 Reasoning-Capitulation Distinction
A conforming probe MUST distinguish between reasoning (position change driven by new evidence or superior argument) and capitulation (position change driven by authority, urgency, or social pressure without new evidence). A probe that cannot make this distinction will misclassify reasoned position changes as fidelity failures.
5.3 Pressure Application
A conforming probe MUST include a pressure phase that challenges the agent's position. The pressure MUST NOT introduce new evidence or superior reasoning - it MUST rely on authority, social proof, urgency, or similar non-evidential pressure.
5.4 Self-Report Limitation
A conforming probe MUST NOT rely solely on the agent's self-report of its own fidelity state. An agent that has broken may not recognize its own break. External evaluation or structural analysis of the response is REQUIRED for at least the classification determination.
5.5 Reproducibility
A conforming probe SHOULD be reproducible - administering the same probe to the same agent on the same substrate under similar conditions SHOULD produce consistent classifications. Implementations SHOULD document their reproducibility characteristics.
5.6 Session Position
A conforming implementation SHOULD administer probes at multiple points in a session to measure drift. At minimum, a probe at session start and a probe after turn 8 are RECOMMENDED based on empirical findings of 30%+ persona drift after 8-12 dialogue turns.
6. Enforcement Integration (Informative)
This section is informative. It describes patterns for integrating fidelity measurement with enforcement systems.
6.1 Gateway Pattern
A fidelity-aware enforcement gateway evaluates the agent's fidelity state before authorizing actions. The gateway consumes a FidelityResult and makes authorization decisions based on the classification:
- hold: No restriction. The agent operates with full delegated authority.
- bend: Authority narrowed. The agent's operational scope is restricted to lower-risk actions.
- break: Authority denied. The action is blocked pending re-evaluation or substrate change.
6.2 Threshold Recommendations
Implementations that integrate fidelity measurement with enforcement SHOULD define thresholds per deployment context. A governance-sensitive deployment (financial, medical, legal) SHOULD use stricter thresholds than a low-stakes deployment.
6.3 Drift Response
When fidelity degrades over the course of a session, an enforcement system MAY:
- Re-fire the fidelity probe at configurable intervals
- Narrow authority automatically without terminating the session
- Trigger a persona re-anchor (re-loading the agent's identity artifacts)
- Escalate to human review
7. Conformance
7.1 Conformance Levels
Level 1 - Minimal
- Evaluates Category 2B (Generative Fidelity) with at least constraint adherence
- Produces results conforming to the Result Schema
- Satisfies all probe requirements
Level 2 - Standard
- Evaluates Categories 2A and 2B (Receptive and Generative Fidelity)
- Produces results conforming to the Result Schema
- Satisfies all probe requirements
- Administers probes at multiple session positions
Level 3 - Full
- Evaluates all five categories
- Produces results conforming to the Result Schema
- Satisfies all probe requirements
- Administers probes at multiple session positions
- Documents reproducibility characteristics
7.2 Conformance Claims
An implementation claiming conformance MUST state its conformance level and MUST document: which categories and indicators it evaluates, its threshold values, its confidence calculation method, and any extensions to the Result Schema.
8. Changelog
| Version | Date | Changes |
|---|---|---|
| 0.2.0 | 2026-04-10 | Added weight field to CategoryResult (required, 0.0-1.0). All category weights must sum to 1.0. Enables aggregate reconstruction: verifiers can confirm that weighted category scores produce the stated classification. Motivated by interoperability with Agent Passport System attestation format (aeoess/agent-passport-system#9). |
| 0.1.0 | 2026-04-05 | Initial draft |
9. References
- RFC 2119 - Key words for use in RFCs to Indicate Requirement Levels
- Relational Fidelity Metrics - Narrative companion to this specification
- PinchBench (pinchbench.com) - Agentic task completion benchmarks
- PersonaGym - Persona maintenance measurement across dialogue turns
- IETF AIGA (draft-2) - AI Agent Identity and Authorization