Code Review Checklist for AI-Generated Code: 12 Things to Verify
AI writes code faster than developers can review it. Here are 12 things to check in every AI-generated PR — from hallucinated packages to security gaps, logic errors, and test coverage.
Tired of slow code reviews? AI catches issues in seconds. You decide what gets published.
Code Review Checklist for AI-Generated Code: 12 Things to Verify
TL;DR: AI does not write bad code — it writes plausible code. The lines compile, the linter passes, the variable names look reasonable, and on a fast PR review it slides through. That is the exact reason bugs ship. This checklist covers 12 specific things that AI gets wrong differently than humans do, with a code example for each, a comparison table showing which items Git AutoReview catches automatically, and a copy-paste version for your team's PR template.
The real problem with AI-generated code is not quality. It is that the failure mode looks like the success mode. A human author who hands you a confused PR usually leaves obvious signs — inconsistent naming, abandoned comments, a TODO that says "fix this." AI strips those signals out by default, which means your standard review pass approves things it would normally catch. 62% of developers use AI tools according to Stack Overflow's 2024 survey, but most review processes have not changed since AI showed up. The result lands in your codebase as a slow leak of subtle bugs.
Why AI-generated code fails differently than human code
GitClear's 2024 analysis of 153 million lines of code put numbers on the shift: code churn doubled compared to 2023, and copy-pasted code climbed from 8.3% of all code to 12.3%. The 2025 follow-up found the trend kept going. More code lands, more of it gets thrown away within weeks, and more of it is duplicated from somewhere else — usually an AI suggestion that the developer accepted without rewriting.
CodeRabbit ran a separate study on 470 real pull requests and found AI-authored PRs averaged 10.83 issues per PR compared to 6.45 for human-written ones. That is roughly 1.7x more problems in the same diff size, with error handling gaps and naming inconsistencies both hitting 2x the human baseline. The Clutch 2025 developer survey added a piece nobody wants to acknowledge: 59% of developers admit they use AI-generated code they don't fully understand. The reviewer is often the only person who reads it carefully.
The core failure is alignment. AI optimizes for syntactic correctness and surface plausibility — code that parses, passes type checks, and matches the pattern of training-data examples. It does not optimize for semantic correctness, architectural fit, or whether the result actually solves the problem the ticket described. Those last three are exactly what code review exists to verify, and they are the parts a human reviewer has to do manually because the AI cannot self-check them.
The 12-item checklist
1. Requirement alignment — does the code actually do what the ticket asks?
AI reads the ticket literally and fills in the gaps with its own assumptions. "Add export button" turns into CSV export of the current page only — not Excel, not all users, not the format your finance team actually needs. The gap between ticket intent and code behavior is where most "this works but it's wrong" PRs come from.
// Ticket: "Export user data"
// AI generated: exports only current page
function exportUsers() {
return currentPageUsers.map(u => u.toCSV());
// Missing: pagination, all users, format preference
}
What to check: read the ticket once, read the code, ask whether a non-technical stakeholder would call this "done."
2. Hallucinated package names (slopsquatting)
USENIX Security 2025 measured commercial LLMs hallucinating package names at a 5.2% rate. Attackers race to pre-register those exact names as malicious packages on npm and PyPI — the practice is called slopsquatting, and it works because developers copy AI output into package.json without checking.
{
"dependencies": {
"react-data-fetcher": "^2.1.0",
"mongoose-validator-utils": "^1.0.0"
}
}
What to check: for every new dependency, run npm info <package-name> (or the equivalent for your registry), look at the download count, the GitHub repo, and the publish date. A package that appeared three weeks ago with no downloads is a red flag regardless of how reasonable the name sounds.
3. Cross-file side effects (what changed outside the diff?)
AI sees the diff. It does not see the rest of your codebase. A rename that looks clean in the PR can break 14 import paths in files that are not part of the change, and TypeScript or your linter might not catch it if the imports are dynamic or wrapped in conditional logic.
// AI renamed: formatDate → formatDateTime
// Clean in this file, but 8 other files still import formatDate
export function formatDateTime(date: Date): string { ... }
What to check: grep the old name across the codebase before approving. For function renames, search for both the import and the call site. For config changes, look at every build script that reads the file.
4. Hardcoded credentials and secrets
GitGuardian's research found AI-assisted commits leak secrets at roughly 2x the rate of human commits. The pattern is consistent — AI fills in placeholder values it has seen in training data, and those placeholders are sometimes real keys somebody published to a public repo years ago.
OPENAI_API_KEY = "sk-proj-abc123..."
DATABASE_URL = "postgresql://user:password123@prod-db:5432/myapp"
What to check: any string literal that looks like a key, token, or credential. Tools that scan for entropy patterns catch most of them, but the review pass should still flag anything that is not loaded from environment variables.
5. Error handling completeness
CodeRabbit's PR study put error handling gaps at 2x the human baseline. AI tends to write the happy path cleanly and skip the failure modes — network errors, null returns, partial responses, timeouts. The function "works" in development where nothing fails, then breaks in production where everything does.
async function fetchUser(id) {
const response = await fetch(`/api/users/${id}`);
const data = await response.json();
return data.user;
}
What to check: every await, every external call, every place data crosses a trust boundary. If there is no try/catch and no response.ok check, the code is incomplete.
6. Logic correctness (not just syntax)
The most dangerous AI bugs are the ones where every line is grammatically correct and the overall logic is wrong. Off-by-one errors in date ranges, inverted conditionals, wrong operator precedence — the linter cannot help here, the type checker cannot help here, and a fast review will miss it because the code looks right.
def is_eligible(user):
return not user.is_premium and user.subscription_active
# Should be: user.is_premium and user.subscription_active
What to check: read the code out loud. If the function name says "is eligible" and the code returns true for non-premium users, something is off. Trace one real input through the function by hand.
7. Naming and consistency
CodeRabbit measured naming inconsistencies at 2x the human rate. AI generates code in its own naming style and ignores the conventions of the surrounding file. You end up with snake_case, camelCase, and PascalCase instances of the same kind of thing in the same module.
const user_service = new UserService();
const UserRepo = new UserRepository();
const getuser = async () => {};
What to check: scan for naming style mismatches before merging. If the project uses camelCase for variables, every new variable should be camelCase. Configure a linter rule if you can — most teams cannot maintain this manually.
8. Dead and unreachable code
AI often generates belt-and-suspenders code that the type system or runtime would never execute. Redundant null checks after an early return, fallback branches that cannot be reached, unused variables that compile but add cognitive load to every future reader.
function processPayment(amount: number): Result {
if (amount <= 0) throw new Error('Invalid amount');
if (amount <= 0) return { error: 'Invalid' }; // Dead
// ...
}
What to check: every conditional branch — could the runtime ever reach it? Every variable declaration — does it get used? A reviewer with five minutes can catch most of this by reading top to bottom and asking "why is this here?"
9. Test coverage for new paths
AI rarely writes tests for the edge cases that matter. The happy path gets a test. Error paths often do not. The new code might pass existing tests by changing what those tests actually verify — a subtle form of test rot that takes weeks to surface.
What to check: every new function should have at least one test. Error paths need tests too, not just success paths. If existing tests pass after a refactor, look at whether they pass for the right reasons or whether the refactor accidentally weakened the assertions.
10. Console.log and debug artifacts
AI leaves debugging artifacts everywhere. console.log statements with PII in the output, commented-out blocks with TODO markers that were never meant to ship, debugger keywords that crash production builds. These are individually small and collectively a noise problem in production logs.
async function processOrder(order) {
console.log('Processing order:', order);
console.log('User:', JSON.stringify(order.user)); // PII in logs
// TODO: add validation here
return await db.orders.create(order);
}
What to check: grep for console.log, print(, debugger, and TODO in the diff. None of them belong in production code unless your team has explicit policy for it.
11. Security vulnerabilities (OWASP Top 10)
Veracode's 2025 GenAI Code Security Report measured 45% of AI-generated code introducing at least one OWASP Top 10 vulnerability. The leading patterns are injection (SQL, NoSQL, command), XSS through unsafe innerHTML, broken access control on new endpoints, and missing input validation on user-supplied data.
def get_user(username):
query = f"SELECT * FROM users WHERE name = '{username}'"
return db.execute(query)
What to check: parameterized queries, sanitized HTML output, authorization checks on every new endpoint, validation on every input that crosses a trust boundary. Tools help here, but the reviewer still has to verify that the tools are configured to scan the new code.
12. Architectural fit
AI writes code that works in isolation. It does not know that your team made a deliberate choice to route all DB access through a service layer, that this React component is in the presentation tier and should not call Supabase directly, or that the middleware pattern you established three months ago exists for a reason.
// In a React component — AI added a direct Supabase call
const { data } = await supabase.from('users').select('*');
// Should go through: userService.getUsers()
What to check: does the new code follow the project's existing patterns for its layer? If the codebase has a service layer, repository layer, or controller pattern, the new code should respect those boundaries. This is the item where Deep Review (full codebase exploration) catches things diff-only review misses.
Which items can be automated?
Some of these 12 items need a human reading the ticket and the code together. Others are pattern matches that an automated tool catches faster and more consistently than a tired reviewer at 4 PM on a Friday. Here is the honest split:
| Item | Manual | Git AutoReview | How |
|---|---|---|---|
| 1. Requirement alignment | Manual | Jira Integration | Reads ticket, flags code-ticket gaps |
| 2. Hallucinated packages | Manual | Manual | Check npm/PyPI directly |
| 3. Cross-file side effects | Partial | Deep Review | Explores full codebase, follows imports |
| 4. Hardcoded secrets | Manual | 20+ security rules | Catches API keys, passwords, tokens |
| 5. Error handling | Manual | Deep Review | Traces async paths across files |
| 6. Logic errors | Manual | Quick Review | Flags inverted conditions, off-by-one |
| 7. Naming consistency | Manual | Quick Review | Compares to codebase conventions |
| 8. Dead code | Manual | 20+ rules | Flags unreachable code, unused vars |
| 9. Test coverage | Manual | Deep Review | Checks coverage for new code paths |
| 10. Debug artifacts | Manual | 20+ rules | console.log, debugger, TODO detection |
| 11. Security (OWASP) | Manual | 20+ rules | SQL injection, XSS, validation |
| 12. Architectural fit | Manual | Deep Review | Checks layer boundaries, patterns |
Git AutoReview's 20+ built-in rules catch items 4, 7, 8, 10, and 11 on every PR. Deep Review adds items 3, 5, 9, and 12 through full codebase exploration. Jira Integration covers item 1. The rest stay with the human reviewer where they belong.
Install Free Extension →
The copyable checklist
Drop this into your PR template, your Notion doc, or wherever your team keeps review process documentation. The wording is intentionally short so reviewers can scan it without losing focus on the code.
## AI-Generated Code Review Checklist
- [ ] 1. Requirement alignment — code does what the ticket actually asked
- [ ] 2. No hallucinated packages — every new import verified on registry
- [ ] 3. Cross-file side effects — renames/refactors don't break files outside the diff
- [ ] 4. No hardcoded secrets — keys/tokens/passwords loaded from env
- [ ] 5. Error handling complete — try/catch on async, response.ok checks, null guards
- [ ] 6. Logic correct — traced one real input through every new function
- [ ] 7. Naming consistent — matches surrounding file's conventions
- [ ] 8. No dead code — every branch reachable, every variable used
- [ ] 9. Tests for new paths — error cases tested, not just happy path
- [ ] 10. No debug artifacts — console.log, debugger, TODO stripped
- [ ] 11. Security clean — parameterized queries, sanitized HTML, validated inputs
- [ ] 12. Architectural fit — respects existing layer boundaries and patterns
On GitHub, save this as .github/PULL_REQUEST_TEMPLATE.md and it auto-fills every new PR. For more PR template patterns, including GitLab and Bitbucket setups, see our full guide.
Where this connects to the rest of your review process
The 12 items above sit on top of standard code review practice, not in place of it. The GitHub code review best practices guide covers the broader workflow — PR size targets, review SLAs, the metrics that matter. The VS Code PR review guide walks through three ways to run reviews inside the editor, including the AI-assisted approach where most of this checklist gets automated. For teams running GitHub Copilot Code Review, the June 2026 pricing change adds a billing wrinkle worth reading about before your invoice arrives.
If your team is already drowning in PRs because AI cranked up the write side without changing the review side — that is the exact problem Git AutoReview was built to solve. Free plan covers 10 reviews per day with no credit card. Team plan handles unlimited reviews for $14.99 flat across the entire team.*
* Git AutoReview subscription price only. AI compute costs of approximately $2–5/month per developer are billed directly by your AI provider (Anthropic, Google, or OpenAI). CodeRabbit and Qodo bundle AI compute into their per-user price.
Tired of slow code reviews? AI catches issues in seconds. You decide what gets published.
Frequently Asked Questions
How is reviewing AI-generated code different from reviewing human code?
What is slopsquatting in AI code review?
How often does AI-generated code have security vulnerabilities?
What percentage of developers use AI coding tools?
Can AI tools review AI-generated code?
How do I add this checklist to our PR template?
Try it on your next PR
AI reviews your code for bugs, security issues, and logic errors. You approve what gets published.
Free: 10 AI reviews/day, 1 repo. No credit card.
Related Articles
GitHub Copilot Code Review Cost 2026: What Changes on June 1
GitHub Copilot Code Review starts consuming Actions minutes on June 1. We broke down exactly what teams of 5, 10, and 20 developers will pay — and when the math tips against staying.
How to Review Pull Requests in VS Code 2026
Three methods for reviewing PRs without leaving VS Code — the GitHub Pull Requests extension (34.5M installs), Claude Code CLI for AI pre-review, and Git AutoReview for AI review on GitHub, GitLab, and Bitbucket. Step-by-step setup for each.
Best AI Code Review Tools for Bitbucket 2026: How to Choose (Scoring Matrix)
Scored every AI code review tool on Bitbucket Cloud, Server, and Data Center support. Pricing, BYOK, human approval, setup complexity — compared in one place.
Get the AI Code Review Checklist
25 PR bugs AI catches that humans miss — with real code examples. Free PDF, sent instantly.
One-click unsubscribe. We never share your email.