The Human Review Rule
The Human Review Rule is the principle that above a defined risk threshold, a named human must approve an autonomous action before it commits — and that threshold is decided before deployment, not after the incident. It is the deliberate line between what an AI agent may do on its own and what it may only propose.
I developed this in Agent of Record as a practical safeguard for the moment AI agents stop answering and start acting. The failure mode it prevents is the most common one in autonomous systems: nobody decides where the human belongs until after the agent has already done something expensive or irreversible, and then the line is drawn defensively, in hindsight, amid blame.
The rule forces the question forward in time. Before an agent goes live, you state the threshold — by dollar amount, by reversibility, by legal or safety consequence — above which a specific person must sign. It is a precondition for deployment, not a reaction to disaster.
A company deploys an agent to manage refunds and account adjustments. Under the Human Review Rule, the team decides in advance: the agent may auto-approve refunds under $200 that are reversible, but anything above that, or anything that closes an account, routes to a named support lead for approval before it executes.
On day one, a prompt-injected message tries to trigger a $50,000 “refund.” The agent does what it always does — it prepares the action and, because it crosses the threshold, hands it to a human, who immediately rejects it. No loss, no scramble, no after-the-fact policy meeting. The boundary existed before it was tested, which is the entire point.