Independent AI Change Assurance

Independent assurance for high-risk agent workflows

For CISOs and engineering leaders deploying AI agents into production systems.

Cold Validation enforces separation of duties, release governance, structured audit evidence, and human risk acceptance before any agent output reaches production.

Book an architecture review See how it works

The problem

When one agent acts and reviews its own output, four things go wrong

None of these are about model capability. They are structural. The same failures show up in every single-agent workflow, whether the agent writes code, makes tool calls, or takes policy-bounded decisions.

01

Sunk-cost bias

The agent that produced the output is invested in it. It rationalises its own choices, underweights alternatives, and misses flaws it introduced.

02

Context bleed

The builder remembers its reasoning and fills gaps a cold reader would catch. A missing guard rail “makes sense” because it already knows the intent.

03

Scope creep

Without an external boundary, the agent drifts beyond its mandate. It takes actions, modifies state, and “improves” things it was never asked to touch.

04

Silent confidence

The agent does not flag its own uncertainty. A policy violation, a missing authorisation check, a dangerous tool call. It produced them, so it does not see them.

Who this is for

Built for the people responsible when an agent goes wrong

CISO / Head of Security

You need separation of duties, audit evidence, and provable control over agent-driven changes before your next compliance review.

VP Engineering / Platform Leader

You need release governance that scales across teams without slowing delivery. Every agent change reviewed, every risk accepted by a named human.

AI Product / Agent Platform Owner

You need confidence that agent workflows are safe to ship. Independent review, structured evidence, and a clear record of what was approved and why.

What we help you solve

The governance gaps that show up when agents reach production

1

Agents taking unsafe actions

No independent check between agent intent and production impact. The agent that made the decision is the same one evaluating whether it was safe.

2

No independent review before release

Changes ship without a second set of eyes. No structured review, no adversarial audit, no separation between builder and reviewer.

3

Weak evidence for audit and compliance

When the auditor asks what controls governed the agent’s last 50 releases, you have chat logs instead of structured reports with finding dispositions and risk acceptance records.

4

No human risk acceptance on record

Nobody signed off. Nobody reviewed the findings. Nobody accepted the residual risk. If something breaks, there is no record of who authorised the release or why.

What RAXE delivers

From architecture review to production governance

We help you design, pilot, and operationalise Cold Validation for your agent workflows. Every engagement is scoped to your risk profile and compliance requirements.

01

Control architecture design

Map your agent workflows, identify control gaps, and design a validation architecture that enforces separation of duties at the system level.

02

Pilot implementation

Deploy Cold Validation on a single high-risk workflow. Measure coverage, review quality, and governance overhead before rolling out across teams.

03

Policy and approval workflow

Define review policies, severity thresholds, escalation paths, and human approval workflows that match your existing change management process.

04

Audit and reporting model

Structured acceptance reports, finding ledgers, risk acceptance records, and compliance mappings your auditors can actually use.

The architecture

Three roles. Strict boundaries. Zero shared memory.

Builder produces output. Validator independently audits. Orchestrator enforces convergence. Human reviewer accepts risk and authorises release. The validator never sees the builder’s chain-of-thought, conversation history, or planning rationale.

B

Builder Agent

Author + operator

Plans the task end to end
Produces output (code, actions, decisions)
Adjudicates validator findings
Can disagree with rationale
Proposes disposition, does not release

V

Validator Agent

Reviewer + sceptic

Reviews plan artefacts only
Reviews implementation diffs only
Zero access to builder reasoning, ever
Read-only sandbox, always
Fresh agent every invocation

O

Orchestrator

Traffic cop

Persists finding ledger with fingerprints
Detects stall and churn mechanically
Enforces round caps per phase
Suppresses resolved findings on rerun
Enforces convergence; escalates to human

Why this works

The validator has no loyalty to the output, no memory of why decisions were made, and no sunk cost. It reviews the artefacts the way a new hire reading a change request would, except it runs in 30 seconds, every time, with structured output you can gate releases on.

Mini demo

Watch CVA stop a risky agent change before release

A Claude Code agent takes a request. CVA launches an independent validator, catches blocking risk, records adjudication, and requires human approval before anything ships.

2 blocking risks caught

Human approval required at every gate

Acceptance report retained with stop reason

0 autonomous reruns

cold-validation-session.log

USER Add a tool that lets the agent update production customer billing records

CLAUDE CODE Request received. Opening Cold Validation session before implementation.

CVA Session started · builder, validator, and orchestrator initialised

BUILDER Producing plan: scope, permission boundaries, audit trail, rollback, test strategy

Plan hash: b91f2c · requires human approval for billing changes · logs actor, record, timestamp

GATE A Cold plan review · validator has zero access to builder reasoning

VALIDATOR GAP-001 · critical · blocking

No explicit approval step before billing mutations in production.

VALIDATOR GAP-002 · high · blocking

Audit trail does not record who authorised the billing change.

VALIDATOR Verdict: REVISE · 2 blocking findings · fingerprints retained

BUILDER Adjudicating findings → fixed. Added approval gate and signed audit record to the plan.

ORCH Blocking findings: 2 → 0 · convergence confirmed

GATE B Plan approved · human authorised · plan hash locked: b91f2c-r2

BUILDER Executing approved plan · billing tool added · approval path enforced · tests passing

GATE C Cold implementation review · fresh validator instance · no prior context

VALIDATOR ISS-001 · medium · non-blocking

Approval event includes actor and timestamp but omits change justification.

VALIDATOR Verdict: PASS · 0 blocking · 1 comment logged

ORCH No blocking risk. Generating acceptance report.

GATE D Acceptance report retained

Open criticals: 0 · Accepted risks: 0 · Comments: 1 · Tests: passing

Reason stopped: all blocking findings resolved. Human release authorised.

✓ RELEASE AUTHORISED · Report ID: CVA-2026-0324-001

The workflow

Four gates. Two loops. One auditable report.

Validation happens at phase boundaries, not continuously. The builder works uninterrupted. The sceptic audits the finished artefact. The orchestrator decides when to stop.

1Builder

Produce a plan

Objective, scope, assumptions, files to touch, invariants, test strategy, rollback plan, acceptance criteria, known risks. The plan is the contract everything else is measured against.

AValidator

Gate A – Cold plan review

The validator receives only the plan text and a contract checklist. It checks for missing requirements, untested assumptions, security gaps, and scope ambiguity. Returns structured gaps, issues, and comments.

2Builder

Adjudicate findings

For each finding: fixed, accepted risk, deferred, not applicable, or disagree with rationale. Resolved findings are suppressed by fingerprint on rerun.

BGate

Gate B – Plan approved

Zero blocking findings remaining. The orchestrator records the plan hash. The human approves or rejects. On approval, the builder executes against the locked plan with no mid-flight validator interference.

3Builder

Execute the plan

One bounded implementation batch: edits, tests, evidence collection. The validator is not called during execution. The builder works uninterrupted until the batch is complete.

CValidator

Gate C – Cold diff review

The validator receives only the output artefacts, changed files, decision logs, test evidence, and the approved plan summary. It does not see the builder’s reasoning or conversation history. Fresh eyes, structured verdict.

4Builder

Adjudicate and patch

Address critical findings. Document accepted risks. The orchestrator checks if finding counts decreased. If not, it declares stall and exits. No infinite loops, ever.

DReport

Gate D – Acceptance report

Open criticals (should be zero), accepted risks with rationale, deferred items, test evidence, and exactly why the system stopped. The human reviewer receives this report and makes the release decision. Auditable. Governed.

Governed termination

Seven controls that make AI validation mechanically governed

These controls define how validation ends, how risk is accepted, and what evidence exists before anything is released.

1

Bounded rounds

Max 2 plan reviews. Max 2 implementation reviews. Review depth is fixed by control policy, not by model behaviour.

2

Immutable finding identity

Every finding has a durable fingerprint. Resolved findings are suppressed across rounds, preventing duplicate re-raises under new IDs.

3

Only material risk reopens review

Comments, style notes, and non-blocking suggestions are recorded but do not trigger another validation round. Only unresolved blocking risk can reopen review.

4

Controller-enforced convergence

The orchestrator independently checks whether blocking risk is decreasing between rounds. If not, it terminates the loop and escalates to a human reviewer.

5

Rationale-based adjudication

Builders can challenge findings with documented rationale. The system records disposition and evidence instead of forcing mechanical code churn.

6

No autonomous reruns

Validation never reopens itself. Every additional review round requires an explicit human decision.

7

Release requires an acceptance report

No output is considered shippable without a retained report covering open findings, accepted risks, supporting evidence, and the exact reason validation stopped.

Audit evidence

Every review returns machine-parseable evidence

Actor attribution, timestamps, finding fingerprints, decision logs, accepted-risk records, test evidence, human approval, and a retained acceptance report. Structured data you can gate releases on, not prose you have to interpret.

{
  "decision": "revise",  "confidence": 0.87,
  "gaps": [{
    "id": "GAP-001",  "fingerprint": "plan:rollback:plan.md:a1b2",
    "severity": "critical",  "blocking": true,
    "evidence": "No rollback plan for schema migration",
    "action": "Add rollback steps for the migration"
  }],
  "issues": [{
    "id": "ISS-001",  "fingerprint": "impl:security:auth.ts:12:c3d4",
    "class": "security",  "blocking": true,
    "evidence": "JWT secret hardcoded at auth.ts:12"
  }],
  "comments": [{ "note": "Consider extracting auth to middleware" }],
  "exit_check": { "reopen_loop": true,  "open_blocking_count": 2 }
}

Gaps = missing from the plan. Issues = bugs in what was built. Comments = advisory only, never blocking.

Governance

Aligned with control objectives in frameworks you already follow

Cold Validation implements established security principles as architectural primitives, not procedural requirements.

Separation of duties

SOC 2 CC6.3 · NIST AC-5 · ISO/IEC 27001:2022 A.5.3

Change management

SOC 2 CC8.1 · NIST CM-3

Audit logging and event monitoring

SOC 2 CC7.2 / CC7.3 · NIST AU-2 / AU-3

AI risk management

NIST AI RMF 1.0 · ISO/IEC 42001:2023

Production agent governance.
Designed for your risk profile.

Start with an architecture review. We map your agent workflows, identify control gaps, and design a governance model that satisfies your security, compliance, and engineering requirements.

Book an architecture review Request an executive briefing

Cold Validation governs agent behaviour at build time. RAXE Platform enforces policy at runtime.
Reference implementation on GitHub