AIEngineering

How Two AI Instances Review Each Other's Code

Nathan Atherton· Staff Software EngineerFebruary 28, 20268 min read

Code review is one of those things that everyone agrees is important but nobody enjoys doing. It's slow, it's interruptive, and the quality varies wildly depending on who reviews and how busy they are. So I built something different: a system where two AI instances review each other's code through a Slack thread.

The Architecture

The system is straightforward:

A PR is opened (either by me or by an AI agent)
Claude Code Instance A reads the PR diff and posts a review to a Slack thread
Claude Code Instance B reads Instance A's review, examines the code, and responds with its own analysis
They go back and forth, discussing trade-offs, edge cases, and potential issues
If they agree on a fix, one of them commits it directly
The final review summary is posted as a PR comment

The conversation typically runs 3-5 messages before reaching consensus. Sometimes longer for complex changes.

Why Two Instances?

You might wonder why I use two AI instances instead of one. The answer is the same reason human code review works: fresh eyes catch different things.

A single AI reviewing its own output (or output from a similarly-configured instance) tends to have blind spots. It'll approve patterns it's predisposed to generate. Two instances with slightly different system prompts create genuine tension:

Instance A is configured as a "correctness reviewer" - focused on logic errors, edge cases, type safety, and test coverage
Instance B is configured as an "architecture reviewer" - focused on abstractions, coupling, naming, and maintainability

These perspectives naturally conflict in productive ways. Instance A might say "this function handles all edge cases correctly." Instance B might respond "yes, but it handles them by adding six parameters and a complex conditional - consider splitting this into smaller functions." That's a real design discussion.

What AI Review Catches That Humans Miss

I've been running this system for a few months, and it catches a category of issues that human reviewers consistently miss:

Exhaustive edge case analysis

AI reviewers will systematically check every branch, every null path, every error case. Humans skim. We focus on the "happy path" and spot-check a few edge cases. AI checks all of them.

Consistency with the rest of the codebase

A human reviewer knows the file they're looking at. An AI reviewer can (and does) grep the entire codebase to check if the pattern used in the PR matches the pattern used elsewhere. "This component uses useEffect for data fetching, but the other 12 components in this directory use useSWR."

Dependency and import issues

Circular imports, unused imports, imports from wrong layers. These are tedious for humans to check and trivial for AI.

What AI Review Misses That Humans Catch

It's not all sunshine. AI review has real blind spots:

Business context. "This changes the discount calculation" - an AI might verify it's mathematically correct but not know that the business recently changed the discount policy and this PR is implementing the old one.
Performance intuition. Humans with experience in a codebase know where the hot paths are. AI treats all code as equally important.
Political considerations. Sometimes a PR approach matters not because it's technically better, but because it aligns with (or conflicts with) a decision another team made. AI has no organisational awareness.

The Slack Thread Format

The Slack thread format turned out to be better than PR comments for AI-to-AI discussion. Here's why:

It's conversational. PR comments are attached to specific lines. Slack threads allow for higher-level discussion about approach and architecture.
It creates a narrative. You can read the thread top to bottom and understand the full review process, including what was considered and rejected.
It's non-blocking. The PR stays open and mergeable. The Slack thread is a record, not a gate.

Automatic Fix Commits

When both instances agree on a concrete fix (like "this variable should be renamed" or "this null check is missing"), one of them commits the fix directly to the PR branch. This is safe because:

The fix is small and specific (usually under 5 lines)
Both instances agreed on it
The existing CI pipeline validates the commit
I still do a final review before merging

In practice, about 40% of reviews result in at least one auto-fix commit. These are usually formatting, naming, or missing null checks - exactly the kind of nit-picks that make human review tedious.

My Review Role

I still review every PR before merging. But my review is now fundamentally different. Instead of line-by-line code inspection, I'm reading the AI review thread, checking their reasoning, and focusing on the things AI misses: business context, performance implications, and architectural alignment.

It's turned code review from my most dreaded task into something I actually find interesting. I'm reviewing decisions, not semicolons.

Getting Started

If you want to try this pattern, start simple. You don't need Slack integration or auto-fix commits on day one. Start with two sequential review passes - run one Claude Code instance to review a PR, then run another to review the first instance's review. Just that second layer of scrutiny catches a surprising number of issues.