Independent AI Change Assurance

Independent assurance for high-risk agent workflows

For CISOs and engineering leaders deploying AI agents into production systems.

Cold Validation enforces separation of duties, release governance, structured audit evidence, and human risk acceptance before any agent output reaches production.

Zero
Unreviewed releases
100%
Change auditability
Every
Risk accepted by a human
Retained
Compliance evidence
The problem

When one agent acts and reviews its own output, four things go wrong

None of these are about model capability. They are structural. The same failures show up in every single-agent workflow, whether the agent writes code, makes tool calls, or takes policy-bounded decisions.

01

Sunk-cost bias

The agent that produced the output is invested in it. It rationalises its own choices, underweights alternatives, and misses flaws it introduced.

02

Context bleed

The builder remembers its reasoning and fills gaps a cold reader would catch. A missing guard rail “makes sense” because it already knows the intent.

03

Scope creep

Without an external boundary, the agent drifts beyond its mandate. It takes actions, modifies state, and “improves” things it was never asked to touch.

04

Silent confidence

The agent does not flag its own uncertainty. A policy violation, a missing authorisation check, a dangerous tool call. It produced them, so it does not see them.

Who this is for

Built for the people responsible when an agent goes wrong

CISO / Head of Security

You need separation of duties, audit evidence, and provable control over agent-driven changes before your next compliance review.

VP Engineering / Platform Leader

You need release governance that scales across teams without slowing delivery. Every agent change reviewed, every risk accepted by a named human.

AI Product / Agent Platform Owner

You need confidence that agent workflows are safe to ship. Independent review, structured evidence, and a clear record of what was approved and why.

What we help you solve

The governance gaps that show up when agents reach production

1

Agents taking unsafe actions

No independent check between agent intent and production impact. The agent that made the decision is the same one evaluating whether it was safe.

2

No independent review before release

Changes ship without a second set of eyes. No structured review, no adversarial audit, no separation between builder and reviewer.

3

Weak evidence for audit and compliance

When the auditor asks what controls governed the agent’s last 50 releases, you have chat logs instead of structured reports with finding dispositions and risk acceptance records.

4

No human risk acceptance on record

Nobody signed off. Nobody reviewed the findings. Nobody accepted the residual risk. If something breaks, there is no record of who authorised the release or why.

What RAXE delivers

From architecture review to production governance

We help you design, pilot, and operationalise Cold Validation for your agent workflows. Every engagement is scoped to your risk profile and compliance requirements.

01

Control architecture design

Map your agent workflows, identify control gaps, and design a validation architecture that enforces separation of duties at the system level.

02

Pilot implementation

Deploy Cold Validation on a single high-risk workflow. Measure coverage, review quality, and governance overhead before rolling out across teams.

03

Policy and approval workflow

Define review policies, severity thresholds, escalation paths, and human approval workflows that match your existing change management process.

04

Audit and reporting model

Structured acceptance reports, finding ledgers, risk acceptance records, and compliance mappings your auditors can actually use.

The architecture

Three roles. Strict boundaries. Zero shared memory.

Builder produces output. Validator independently audits. Orchestrator enforces convergence. Human reviewer accepts risk and authorises release. The validator never sees the builder’s chain-of-thought, conversation history, or planning rationale.

B
Builder Agent

Author + operator

  • Plans the task end to end
  • Produces output (code, actions, decisions)
  • Adjudicates validator findings
  • Can disagree with rationale
  • Proposes disposition, does not release
V
Validator Agent

Reviewer + sceptic

  • Reviews plan artefacts only
  • Reviews implementation diffs only
  • Zero access to builder reasoning, ever
  • Read-only sandbox, always
  • Fresh agent every invocation
O
Orchestrator

Traffic cop

  • Persists finding ledger with fingerprints
  • Detects stall and churn mechanically
  • Enforces round caps per phase
  • Suppresses resolved findings on rerun
  • Enforces convergence; escalates to human
Why this works
The validator has no loyalty to the output, no memory of why decisions were made, and no sunk cost. It reviews the artefacts the way a new hire reading a change request would, except it runs in 30 seconds, every time, with structured output you can gate releases on.
Mini demo

Watch CVA stop a risky agent change before release

A Claude Code agent takes a request. CVA launches an independent validator, catches blocking risk, records adjudication, and requires human approval before anything ships.

2 blocking risks caught
Human approval required at every gate
Acceptance report retained with stop reason
0 autonomous reruns
cold-validation-session.log
USER Add a tool that lets the agent update production customer billing records
CLAUDE CODE Request received. Opening Cold Validation session before implementation.
CVA Session started · builder, validator, and orchestrator initialised
BUILDER Producing plan: scope, permission boundaries, audit trail, rollback, test strategy
Plan hash: b91f2c · requires human approval for billing changes · logs actor, record, timestamp
GATE A Cold plan review · validator has zero access to builder reasoning
VALIDATOR GAP-001 · critical · blocking
No explicit approval step before billing mutations in production.
VALIDATOR GAP-002 · high · blocking
Audit trail does not record who authorised the billing change.
VALIDATOR Verdict: REVISE · 2 blocking findings · fingerprints retained
BUILDER Adjudicating findings → fixed. Added approval gate and signed audit record to the plan.
ORCH Blocking findings: 2 → 0 · convergence confirmed
GATE B Plan approved · human authorised · plan hash locked: b91f2c-r2
BUILDER Executing approved plan · billing tool added · approval path enforced · tests passing
GATE C Cold implementation review · fresh validator instance · no prior context
VALIDATOR ISS-001 · medium · non-blocking
Approval event includes actor and timestamp but omits change justification.
VALIDATOR Verdict: PASS · 0 blocking · 1 comment logged
ORCH No blocking risk. Generating acceptance report.
GATE D Acceptance report retained
Open criticals: 0 · Accepted risks: 0 · Comments: 1 · Tests: passing
Reason stopped: all blocking findings resolved. Human release authorised.
RELEASE AUTHORISED · Report ID: CVA-2026-0324-001
The workflow

Four gates. Two loops. One auditable report.

Validation happens at phase boundaries, not continuously. The builder works uninterrupted. The sceptic audits the finished artefact. The orchestrator decides when to stop.

1Builder

Produce a plan

Objective, scope, assumptions, files to touch, invariants, test strategy, rollback plan, acceptance criteria, known risks. The plan is the contract everything else is measured against.

AValidator

Gate A – Cold plan review

The validator receives only the plan text and a contract checklist. It checks for missing requirements, untested assumptions, security gaps, and scope ambiguity. Returns structured gaps, issues, and comments.

2Builder

Adjudicate findings

For each finding: fixed, accepted risk, deferred, not applicable, or disagree with rationale. Resolved findings are suppressed by fingerprint on rerun.

BGate

Gate B – Plan approved

Zero blocking findings remaining. The orchestrator records the plan hash. The human approves or rejects. On approval, the builder executes against the locked plan with no mid-flight validator interference.

3Builder

Execute the plan

One bounded implementation batch: edits, tests, evidence collection. The validator is not called during execution. The builder works uninterrupted until the batch is complete.

CValidator

Gate C – Cold diff review

The validator receives only the output artefacts, changed files, decision logs, test evidence, and the approved plan summary. It does not see the builder’s reasoning or conversation history. Fresh eyes, structured verdict.

4Builder

Adjudicate and patch

Address critical findings. Document accepted risks. The orchestrator checks if finding counts decreased. If not, it declares stall and exits. No infinite loops, ever.

DReport

Gate D – Acceptance report

Open criticals (should be zero), accepted risks with rationale, deferred items, test evidence, and exactly why the system stopped. The human reviewer receives this report and makes the release decision. Auditable. Governed.

Governed termination

Seven controls that make AI validation mechanically governed

These controls define how validation ends, how risk is accepted, and what evidence exists before anything is released.

1

Bounded rounds

Max 2 plan reviews. Max 2 implementation reviews. Review depth is fixed by control policy, not by model behaviour.

2

Immutable finding identity

Every finding has a durable fingerprint. Resolved findings are suppressed across rounds, preventing duplicate re-raises under new IDs.

3

Only material risk reopens review

Comments, style notes, and non-blocking suggestions are recorded but do not trigger another validation round. Only unresolved blocking risk can reopen review.

4

Controller-enforced convergence

The orchestrator independently checks whether blocking risk is decreasing between rounds. If not, it terminates the loop and escalates to a human reviewer.

5

Rationale-based adjudication

Builders can challenge findings with documented rationale. The system records disposition and evidence instead of forcing mechanical code churn.

6

No autonomous reruns

Validation never reopens itself. Every additional review round requires an explicit human decision.

7

Release requires an acceptance report

No output is considered shippable without a retained report covering open findings, accepted risks, supporting evidence, and the exact reason validation stopped.

Audit evidence

Every review returns machine-parseable evidence

Actor attribution, timestamps, finding fingerprints, decision logs, accepted-risk records, test evidence, human approval, and a retained acceptance report. Structured data you can gate releases on, not prose you have to interpret.

{
  "decision": "revise",  "confidence": 0.87,
  "gaps": [{
    "id": "GAP-001",  "fingerprint": "plan:rollback:plan.md:a1b2",
    "severity": "critical",  "blocking": true,
    "evidence": "No rollback plan for schema migration",
    "action": "Add rollback steps for the migration"
  }],
  "issues": [{
    "id": "ISS-001",  "fingerprint": "impl:security:auth.ts:12:c3d4",
    "class": "security",  "blocking": true,
    "evidence": "JWT secret hardcoded at auth.ts:12"
  }],
  "comments": [{ "note": "Consider extracting auth to middleware" }],
  "exit_check": { "reopen_loop": true,  "open_blocking_count": 2 }
}

Gaps = missing from the plan. Issues = bugs in what was built. Comments = advisory only, never blocking.

Governance

Aligned with control objectives in frameworks you already follow

Cold Validation implements established security principles as architectural primitives, not procedural requirements.

Separation of duties

SOC 2 CC6.3 · NIST AC-5 · ISO/IEC 27001:2022 A.5.3

Change management

SOC 2 CC8.1 · NIST CM-3

Audit logging and event monitoring

SOC 2 CC7.2 / CC7.3 · NIST AU-2 / AU-3

AI risk management

NIST AI RMF 1.0 · ISO/IEC 42001:2023

Production agent governance.
Designed for your risk profile.

Start with an architecture review. We map your agent workflows, identify control gaps, and design a governance model that satisfies your security, compliance, and engineering requirements.

Book an architecture review Request an executive briefing

Cold Validation governs agent behaviour at build time. RAXE Platform enforces policy at runtime.
Reference implementation on GitHub