Why is the Human Review Rule important for AI agents?

It concentrates human judgment on the small fraction of high-consequence actions while letting agents handle routine work autonomously, keeping a clear, pre-agreed line of accountability for risky decisions.

How do you set the Human Review Rule threshold?

You set it by the consequence and reversibility of the action — spend limits, irreversibility, legal or safety impact — and you write it down before the agent goes live so the boundary is explicit and auditable.

Glossary · Coined framework

The Human Review Rule

Q: What is the Human Review Rule?

The Human Review Rule is the principle that above a defined risk threshold, a named human must approve an autonomous action before it commits. The threshold is decided before deployment, not improvised after something goes wrong.

Coined by Ross Blankenship

Illustration: a human eye and checkmark reviewing AI output

Definition

The Human Review Rule is the principle that above a defined risk threshold, a named human must approve an autonomous action before it commits — and that threshold is decided before deployment, not after the incident. It is the deliberate line between what an AI agent may do on its own and what it may only propose.

Origin

I developed this in Agent of Record as a practical safeguard for the moment AI agents stop answering and start acting. The failure mode it prevents is the most common one in autonomous systems: nobody decides where the human belongs until after the agent has already done something expensive or irreversible, and then the line is drawn defensively, in hindsight, amid blame.

The rule forces the question forward in time. Before an agent goes live, you state the threshold — by dollar amount, by reversibility, by legal or safety consequence — above which a specific person must sign. It is a precondition for deployment, not a reaction to disaster.

Worked example

A company deploys an agent to manage refunds and account adjustments. Under the Human Review Rule, the team decides in advance: the agent may auto-approve refunds under $200 that are reversible, but anything above that, or anything that closes an account, routes to a named support lead for approval before it executes.

On day one, a prompt-injected message tries to trigger a $50,000 “refund.” The agent does what it always does — it prepares the action and, because it crosses the threshold, hands it to a human, who immediately rejects it. No loss, no scramble, no after-the-fact policy meeting. The boundary existed before it was tested, which is the entire point.

Start a conversation Back to glossary