The power of an AI assistant is its ability to generate code at a speed that exceeds human writing. But in software engineering, writing code is only a fraction of the work. The more important work is ensuring that the code is safe, correct, and fits the architecture. In an AI-assisted workflow, the burden of this responsibility shifts from authoring to reviewing.
A common mistake in AI workflows is treating the output as “pre-verified.” Because the code often looks clean, follows style guides, and compiles on the first try, there is a temptation to give it a superficial review. But AI-generated code can be subtly wrong in ways that manual code is not. It can satisfy the happy path while ignoring the edge cases, or it can introduce hidden couplings that only become apparent during a production incident. Review is the single most important line of defense.
When reviewing AI-generated code, don’t start with the syntax or the variable names. Start with the behavior. A high-quality review follows a hierarchy of risk:
require and ensure conditions of its own class and every class it calls? This is the most critical check.If the code fails any of the first four checks, the fifth check doesn’t matter. You shouldn’t polish code that is behaviorally broken.
To make review a routine discipline, use a focused checklist for every AI-generated patch:
Consider a requirement:
“The AI was asked to add retry logic to the delivery dispatch port.”
The AI provides a loop that retries on failure. A naive reviewer sees “it retries” and approves. A rigorous reviewer asks the following:
By asking these questions, you move from “it looks like it works” to “I have proven it is safe.”
In Nex, our explicit failure paths and contract checks provide the perfect framework for this kind of review.
-- The AI-generated proposal for a dispatch port with retry
class Dispatch_Port
feature
send(task_id: String): String
require
id_present: task_id /= ""
do
result := "SENT"
ensure
known_result:
result = "SENT" or
result = "TRANSIENT_FAILURE" or
result = "PERMANENT_FAILURE"
end
end
class Dispatch_With_Retry
create
make(port: Dispatch_Port) do
this.port := port
end
feature
port: Dispatch_Port
send_task(task_id: String, max_attempts: Integer): String
require
id_present: task_id /= ""
attempts_valid: max_attempts >= 1
do
let attempt: Integer := 1
let status: String := "TRANSIENT_FAILURE"
from
until attempt > max_attempts or
status = "SENT" or
status = "PERMANENT_FAILURE" do
status := port.send(task_id)
attempt := attempt + 1
end
result := status
ensure
known_result:
result = "SENT" or
result = "TRANSIENT_FAILURE" or
result = "PERMANENT_FAILURE"
end
end
A reviewer looking at this Nex implementation can immediately see the logic’s boundaries. They can see that the loop is bounded by max_attempts. They can see that it stops correctly on a PERMANENT_FAILURE. The ensure clause makes the expected outcomes explicit. The review task is transformed from “understanding the code” to “validating the contract.”
In the delivery system, review focuses on state transition legality. Did the AI-generated code allow a task to skip a mandatory “verification” step?
In the knowledge engine, review focuses on ranking correctness. Did the AI’s “optimized” ranking algorithm accidentally exclude documents that were highly relevant but didn’t match a new heuristic?
In the virtual world, review focuses on determinism. Did the AI introduce a non-deterministic random number call into a simulation loop that must remain reproducible?
In each case, the reviewer is the guardian of the system’s core principles.
Style-First Review. Spending 15 minutes debating a variable name while missing a fatal race condition is the most common review failure. The remedy is to follow the hierarchy of risk: contracts first, style last.
No Regression Focus. Assuming that because the new feature works, the old features still work. The remedy is to compare the observable behavior of the system before and after the patch, ideally using automated parity tests.
The “Rubber Stamp” Problem. Approving an AI-generated patch because you are in a hurry. The remedy is to acknowledge that AI code is “guilty until proven innocent.” If you don’t have time to review it properly, you don’t have time to merge it.
Quick Exercise
Take one recent AI-generated patch (even a small one) and apply the Risk-First Hierarchy: 1. Check the contract (inputs/outputs). 2. Check the invariants (state safety). 3. Check the failure paths (what could go wrong?). 4. Check for hidden coupling.
Did you find anything that a “style-only” review would have missed?
Takeaways
The next chapter, Human Judgment in an AI World, closes the book with the final piece of the puzzle: the role of human judgment and accountability in a world where AI is doing more and more of the work.