When one agent acts and reviews its own output, four things go wrong
None of these are about model capability. They are structural. The same failures show up in every single-agent workflow, whether the agent writes code, makes tool calls, or takes policy-bounded decisions.
Sunk-cost bias
The agent that produced the output is invested in it. It rationalises its own choices, underweights alternatives, and misses flaws it introduced.
Context bleed
The builder remembers its reasoning and fills gaps a cold reader would catch. A missing guard rail “makes sense” because it already knows the intent.
Scope creep
Without an external boundary, the agent drifts beyond its mandate. It takes actions, modifies state, and “improves” things it was never asked to touch.
Silent confidence
The agent does not flag its own uncertainty. A policy violation, a missing authorisation check, a dangerous tool call. It produced them, so it does not see them.
Built for the people responsible when an agent goes wrong
CISO / Head of Security
You need separation of duties, audit evidence, and provable control over agent-driven changes before your next compliance review.
VP Engineering / Platform Leader
You need release governance that scales across teams without slowing delivery. Every agent change reviewed, every risk accepted by a named human.
AI Product / Agent Platform Owner
You need confidence that agent workflows are safe to ship. Independent review, structured evidence, and a clear record of what was approved and why.
The governance gaps that show up when agents reach production
Agents taking unsafe actions
No independent check between agent intent and production impact. The agent that made the decision is the same one evaluating whether it was safe.
No independent review before release
Changes ship without a second set of eyes. No structured review, no adversarial audit, no separation between builder and reviewer.
Weak evidence for audit and compliance
When the auditor asks what controls governed the agent’s last 50 releases, you have chat logs instead of structured reports with finding dispositions and risk acceptance records.
No human risk acceptance on record
Nobody signed off. Nobody reviewed the findings. Nobody accepted the residual risk. If something breaks, there is no record of who authorised the release or why.
From architecture review to production governance
We help you design, pilot, and operationalise Cold Validation for your agent workflows. Every engagement is scoped to your risk profile and compliance requirements.
Control architecture design
Map your agent workflows, identify control gaps, and design a validation architecture that enforces separation of duties at the system level.
Pilot implementation
Deploy Cold Validation on a single high-risk workflow. Measure coverage, review quality, and governance overhead before rolling out across teams.
Policy and approval workflow
Define review policies, severity thresholds, escalation paths, and human approval workflows that match your existing change management process.
Audit and reporting model
Structured acceptance reports, finding ledgers, risk acceptance records, and compliance mappings your auditors can actually use.
Three roles. Strict boundaries. Zero shared memory.
Builder produces output. Validator independently audits. Orchestrator enforces convergence. Human reviewer accepts risk and authorises release. The validator never sees the builder’s chain-of-thought, conversation history, or planning rationale.
Author + operator
- Plans the task end to end
- Produces output (code, actions, decisions)
- Adjudicates validator findings
- Can disagree with rationale
- Proposes disposition, does not release
Reviewer + sceptic
- Reviews plan artefacts only
- Reviews implementation diffs only
- Zero access to builder reasoning, ever
- Read-only sandbox, always
- Fresh agent every invocation
Traffic cop
- Persists finding ledger with fingerprints
- Detects stall and churn mechanically
- Enforces round caps per phase
- Suppresses resolved findings on rerun
- Enforces convergence; escalates to human
Watch CVA stop a risky agent change before release
A Claude Code agent takes a request. CVA launches an independent validator, catches blocking risk, records adjudication, and requires human approval before anything ships.
Four gates. Two loops. One auditable report.
Validation happens at phase boundaries, not continuously. The builder works uninterrupted. The sceptic audits the finished artefact. The orchestrator decides when to stop.
Produce a plan
Objective, scope, assumptions, files to touch, invariants, test strategy, rollback plan, acceptance criteria, known risks. The plan is the contract everything else is measured against.
Gate A – Cold plan review
The validator receives only the plan text and a contract checklist. It checks for missing requirements, untested assumptions, security gaps, and scope ambiguity. Returns structured gaps, issues, and comments.
Adjudicate findings
For each finding: fixed, accepted risk, deferred, not applicable, or disagree with rationale. Resolved findings are suppressed by fingerprint on rerun.
Gate B – Plan approved
Zero blocking findings remaining. The orchestrator records the plan hash. The human approves or rejects. On approval, the builder executes against the locked plan with no mid-flight validator interference.
Execute the plan
One bounded implementation batch: edits, tests, evidence collection. The validator is not called during execution. The builder works uninterrupted until the batch is complete.
Gate C – Cold diff review
The validator receives only the output artefacts, changed files, decision logs, test evidence, and the approved plan summary. It does not see the builder’s reasoning or conversation history. Fresh eyes, structured verdict.
Adjudicate and patch
Address critical findings. Document accepted risks. The orchestrator checks if finding counts decreased. If not, it declares stall and exits. No infinite loops, ever.
Gate D – Acceptance report
Open criticals (should be zero), accepted risks with rationale, deferred items, test evidence, and exactly why the system stopped. The human reviewer receives this report and makes the release decision. Auditable. Governed.
Seven controls that make AI validation mechanically governed
These controls define how validation ends, how risk is accepted, and what evidence exists before anything is released.
Bounded rounds
Max 2 plan reviews. Max 2 implementation reviews. Review depth is fixed by control policy, not by model behaviour.
Immutable finding identity
Every finding has a durable fingerprint. Resolved findings are suppressed across rounds, preventing duplicate re-raises under new IDs.
Only material risk reopens review
Comments, style notes, and non-blocking suggestions are recorded but do not trigger another validation round. Only unresolved blocking risk can reopen review.
Controller-enforced convergence
The orchestrator independently checks whether blocking risk is decreasing between rounds. If not, it terminates the loop and escalates to a human reviewer.
Rationale-based adjudication
Builders can challenge findings with documented rationale. The system records disposition and evidence instead of forcing mechanical code churn.
No autonomous reruns
Validation never reopens itself. Every additional review round requires an explicit human decision.
Release requires an acceptance report
No output is considered shippable without a retained report covering open findings, accepted risks, supporting evidence, and the exact reason validation stopped.
Every review returns machine-parseable evidence
Actor attribution, timestamps, finding fingerprints, decision logs, accepted-risk records, test evidence, human approval, and a retained acceptance report. Structured data you can gate releases on, not prose you have to interpret.
{
"decision": "revise", "confidence": 0.87,
"gaps": [{
"id": "GAP-001", "fingerprint": "plan:rollback:plan.md:a1b2",
"severity": "critical", "blocking": true,
"evidence": "No rollback plan for schema migration",
"action": "Add rollback steps for the migration"
}],
"issues": [{
"id": "ISS-001", "fingerprint": "impl:security:auth.ts:12:c3d4",
"class": "security", "blocking": true,
"evidence": "JWT secret hardcoded at auth.ts:12"
}],
"comments": [{ "note": "Consider extracting auth to middleware" }],
"exit_check": { "reopen_loop": true, "open_blocking_count": 2 }
}
Gaps = missing from the plan. Issues = bugs in what was built. Comments = advisory only, never blocking.
Aligned with control objectives in frameworks you already follow
Cold Validation implements established security principles as architectural primitives, not procedural requirements.
Separation of duties
SOC 2 CC6.3 · NIST AC-5 · ISO/IEC 27001:2022 A.5.3
Change management
SOC 2 CC8.1 · NIST CM-3
Audit logging and event monitoring
SOC 2 CC7.2 / CC7.3 · NIST AU-2 / AU-3
AI risk management
NIST AI RMF 1.0 · ISO/IEC 42001:2023
Production agent governance.
Designed for your risk profile.
Start with an architecture review. We map your agent workflows, identify control gaps, and design a governance model that satisfies your security, compliance, and engineering requirements.
Cold Validation governs agent behaviour at build time. RAXE Platform enforces policy at runtime.
Reference implementation on GitHub