AI Code Review for Java: Tools, Virtual Threads & Setup (2026)
SpotBugs and PMD catch patterns. AI catches the logic errors they miss. We tested traditional Java tools vs AI reviewers on real PRs, including Java 21 virtual thread bugs that no static analyzer detects.
Tired of slow code reviews? AI catches issues in seconds. You decide what gets published.
AI Code Review for Java: What Works, What Doesn't, and What Nobody Tells You
TL;DR: SpotBugs, PMD, and Checkstyle catch known bug patterns fast and free. AI catches logic errors, missing edge cases, and Java 21 virtual thread bugs that no static analyzer detects. Best setup for Java teams: run SpotBugs + PMD in CI for deterministic checks, use Git AutoReview ($14.99/mo flat) or CodeRabbit ($24/user/mo) for AI-powered PR review. Don't replace traditional tools with AI. Use both.
Java has arguably the best static analysis tooling of any language. SpotBugs has been catching null dereferences since it was FindBugs back in 2006. PMD ships 400+ rules. Checkstyle will yell at you for putting a brace on the wrong line.
So why would a Java team bother with AI code review?
Static analysis checks patterns. Your code either matches a known bug signature or it doesn't. What these tools can't do is read your code and ask, "does this actually do what you think it does?" They won't notice that validateEmail accepts any string with an @ sign in it. They have no opinion on retry logic that retries errors that should never be retried. And the virtual thread bug that caused a production outage at Netflix? Completely invisible to them.
I tested both approaches on real Java PRs over a few weeks. This is what came out of it.
What's the best AI code review tool for Java in 2026?
Both matter. They just catch completely different things, and if you don't know where each one stops being useful, you'll waste time.
SpotBugs
SpotBugs (FindBugs' successor) runs bytecode analysis on compiled .class files. It knows about 400 bug patterns across correctness, performance, security, and multithreading.
What it catches well:
- Null pointer dereferences (
NP_NULL_ON_SOME_PATH) - Infinite recursive loops
- Ignored return values on immutable objects (
RV_RETURN_VALUE_IGNORED) - Synchronization issues with mutable statics
- SQL injection via string concatenation
What it misses:
- Business logic errors (it doesn't know what your code is supposed to do)
- Missing validation on inputs that come from external APIs
- Architectural problems like a service that calls the database directly instead of going through the repository layer
- Any bug pattern that isn't in its 400-item list
SpotBugs processes roughly 1000 classes per second. Fast, deterministic, free. Hard to argue with.
PMD
PMD works at the source level. It checks style, complexity, and common mistakes. About 400 Java rules split into rulesets like errorprone, bestpractices, design, and performance.
Good at catching:
- Empty catch blocks
- Unused variables and imports
- God classes (classes with too many methods)
- Cyclomatic complexity violations
- Common copy-paste mistakes (
CompareObjectsWithEquals)
PMD also ships a copy-paste detector (CPD) that finds duplicated code blocks across your codebase. Good for catching the "I'll just copy this method and change two lines" approach before it multiplies.
Checkstyle
Checkstyle is formatting and style enforcement. Indentation, naming conventions, Javadoc requirements, import ordering. It won't find bugs, but it kills style arguments in code reviews, which might be worth more than you think.
Most Java teams use Google Java Format or the Sun/Oracle style preset as a base.
ErrorProne (Google)
ErrorProne is different from the rest. It's a compiler plugin that hooks into javac and catches bugs at compile time. Google runs it across their entire Java codebase internally.
It catches things like:
String.equals()called on a null reference- Missing
@Overrideannotations - Precondition check ordering issues
- Some concurrency bugs (
SynchronizeOnNonFinalField)
ErrorProne is more opinionated than SpotBugs. It ships about 500 checks and many are enabled by default. If a pattern is in its list, it will catch it. If it isn't, it won't.
What AI catches that none of these tools do
I ran the same 30 Java PRs through SpotBugs, PMD, and then through Claude (via Git AutoReview). The AI flagged issues that none of the static tools noticed:
A method called isEligibleForDiscount checked customer tier and purchase history but didn't check whether the discount period had expired. SpotBugs and PMD saw valid Java. The AI read the method name, read the code, and pointed out the mismatch.
A REST endpoint caught HttpClientErrorException but not HttpServerErrorException. The catch block returned a user-friendly message for 4xx errors, but 5xx errors bubbled up as unhandled 500s. Static tools don't reason about HTTP semantics.
A unit test for a validation method tested six inputs but missed empty string. The validation method had a StringUtils.isBlank() check that no test exercised.
A controller called a repository directly, bypassing the service layer. PMD can catch that if you configure it, but the AI flagged it out of the box.
The AI was wrong about 15% of the time. But the remaining 85% were real issues that no static tool would have caught.
Java 21 virtual threads: the bug nobody's tool catches
Java 21 shipped virtual threads as a production feature. They're cheap to create, the JVM manages them, and they do make concurrent code simpler.
They also introduced bugs that didn't exist before. No tool catches them. Not SpotBugs, not PMD, not SonarQube, not AI review tools (at least not reliably). If your team writes concurrent Java, you need to know what to look for yourself.
Thread pinning with synchronized
This is what bit Netflix. Virtual threads run on a small pool of carrier threads (platform threads managed by the JVM). When a virtual thread hits a blocking operation inside a synchronized block, it can't unmount from its carrier thread. The carrier is stuck until the blocking call finishes. This is called "pinning."
// This pins the carrier thread — bad with virtual threads
public synchronized String fetchData(String url) {
HttpResponse<String> response = httpClient.send(
HttpRequest.newBuilder().uri(URI.create(url)).build(),
HttpResponse.BodyHandlers.ofString()
);
return response.body();
}
A 4-vCPU machine typically runs 4 carrier threads. Pin all 4, and your application stops processing virtual threads entirely. Netflix hit exactly this in their auth service. The fix is simple once you know to look for it:
// Use ReentrantLock instead — virtual threads can unmount
private final ReentrantLock lock = new ReentrantLock();
public String fetchData(String url) {
lock.lock();
try {
HttpResponse<String> response = httpClient.send(
HttpRequest.newBuilder().uri(URI.create(url)).build(),
HttpResponse.BodyHandlers.ofString()
);
return response.body();
} finally {
lock.unlock();
}
}
ReentrantLock cooperates with the virtual thread scheduler. When the thread blocks on I/O, it unmounts from the carrier. synchronized doesn't.
Carrier thread starvation
Related to pinning, but harder to spot. If all carrier threads are busy with CPU-heavy work (not I/O), no new virtual threads can get scheduled. Your application looks deadlocked, but it isn't.
// Bad: CPU-intensive work on virtual threads
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
for (var task : tasks) {
executor.submit(() -> {
// Heavy computation — can starve carrier threads
return computeHash(largePayload);
});
}
}
CPU-bound work should stay on platform threads. Virtual threads are for I/O-bound operations.
ThreadLocal misuse
Virtual threads are meant to be short-lived and numerous. ThreadLocal values are tied to the thread instance. With millions of virtual threads, you get millions of ThreadLocal entries that may never get cleaned up.
// Bad with virtual threads — millions of entries, no cleanup
private static final ThreadLocal<SimpleDateFormat> dateFormat =
ThreadLocal.withInitial(() -> new SimpleDateFormat("yyyy-MM-dd"));
Use scoped values (preview in Java 21, finalized in 23) or pass the state explicitly.
Stampede effect
Virtual threads are so cheap to create that you can accidentally launch thousands of concurrent I/O operations. Your code handles it fine. Your database doesn't.
// Launches thousands of concurrent DB queries — connection pool exhaustion
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
List<Future<User>> futures = userIds.stream()
.map(id -> executor.submit(() -> userRepository.findById(id)))
.toList();
}
Fix: use a Semaphore to limit concurrency.
private final Semaphore dbSemaphore = new Semaphore(20); // match pool size
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
List<Future<User>> futures = userIds.stream()
.map(id -> executor.submit(() -> {
dbSemaphore.acquire();
try {
return userRepository.findById(id);
} finally {
dbSemaphore.release();
}
}))
.toList();
}
Why can't tools catch these?
SpotBugs, PMD, and SonarQube don't have rules for virtual thread patterns. They're too new. ErrorProne doesn't check for them either.
AI models sometimes spot the synchronized + blocking I/O pattern if you prompt them. I tested this: Claude caught the synchronized issue in 3 out of 5 PRs. It missed it twice. It never flagged carrier starvation or ThreadLocal issues unprompted.
This is a code review skill Java developers have to learn. No tool covers it yet.
Java 21+ code review checklist
Virtual threads aside, Java 21 brought features that change what you look for in code review.
Records
Records work well for DTOs, but they're immutable and final. Watch for:
- The team isn't adding mutable fields to records (collections should be wrapped in
Collections.unmodifiableList()in the compact constructor) - Records aren't being used where inheritance is needed
equals()andhashCode()behavior is understood (records auto-generate them from all components)
Sealed classes
Sealed classes restrict which classes can extend them. In review:
- Is the
permitsclause complete? - Do
switchstatements over sealed types handle all permitted subtypes? - Adding a new subtype later means updating every
switchthat matches on the sealed type
Pattern matching in switch
Java 21 finalized pattern matching in switch. Easy to get the ordering wrong:
// Watch for ordering — more specific patterns must come first
switch (shape) {
case Circle c when c.radius() > 10 -> handleLargeCircle(c);
case Circle c -> handleCircle(c); // must come after guarded pattern
case Rectangle r -> handleRectangle(r);
// no default needed if switch is exhaustive over sealed type
}
AI reviewers catch ordering issues here better than static tools, because the compiler only catches some cases.
Spring Boot 3.x patterns
If your team uses Spring Boot (72% of Java web developers do), version 3.x introduced changes worth knowing during review:
- Jakarta EE namespace migration (
javax.*tojakarta.*) - Native image support with GraalVM requires different patterns
@HttpExchangeinterfaces for declarative HTTP clients- Virtual thread support via
spring.threads.virtual.enabled=true
How to set up AI code review for Java projects
If you want to add AI review alongside your existing static analysis, here's a setup that works.
Keep your static tools in CI
Don't replace SpotBugs or PMD. Keep them in your CI pipeline running on every commit. They finish in seconds, produce the same results every time, and cost nothing.
<!-- Maven: typical static analysis setup -->
<plugin>
<groupId>com.github.spotbugs</groupId>
<artifactId>spotbugs-maven-plugin</artifactId>
<version>4.8.4</version>
<executions>
<execution>
<phase>verify</phase>
<goals><goal>check</goal></goals>
</execution>
</executions>
</plugin>
Add AI review for PRs
Install Git AutoReview from the VS Code Marketplace. Takes about 5 minutes:
- Install the extension
- Connect your Git platform (GitHub, GitLab, or Bitbucket)
- Add your API key for Claude, Gemini, or GPT
- Open a pull request and run a review
The extension grabs the PR diff, sends it to the AI model, and shows suggestions inside VS Code. You decide which comments to publish to the PR. Nothing auto-posts.
GitHub, GitLab, Bitbucket. SpotBugs catches the patterns. AI catches everything else. You approve every comment.
Install Git AutoReview →
What the combined setup catches
| Category | SpotBugs/PMD | AI review |
|---|---|---|
| Null dereferences | Yes (pattern-based) | Yes (context-aware) |
| SQL injection | Yes (string concat patterns) | Yes (also catches ORM misuse) |
| Empty catch blocks | Yes | Yes |
| Logic errors | No | Yes |
| Missing edge cases | No | Yes |
| Virtual thread bugs | No | Sometimes |
| Code style violations | Checkstyle handles this | Not the best use of AI |
| Architecture violations | Limited (needs config) | Yes (no config needed) |
| Security (OWASP) | Yes (known patterns) | Yes (also catches custom auth issues) |
The tools overlap on about 30% of findings. The other 70% is unique to each approach.
How much does AI code review cost for a Java team?
Java teams tend to be larger and more enterprise-oriented. Here's what AI code review actually costs at different team sizes:
| Tool | Per-user/mo | Team of 5 | Team of 10 | Team of 20 |
|---|---|---|---|---|
| Git AutoReview Team | $14.99 flat | $14.99 | $14.99 | $14.99 |
| CodeRabbit Pro | $24/user | $120 | $240 | $480 |
| Qodo Merge Teams | $30/user | $150 | $300 | $600 |
| GitHub Copilot Business | $19/user | $95 | $190 | $380 |
Git AutoReview uses BYOK (bring your own key). You pay for the extension ($14.99/month) plus your own API usage. A typical review costs about $0.10 with Gemini 2.5 Pro, $0.23 with Claude Sonnet. A team doing 20 reviews per day spends roughly $40-90/month in API costs on top of the $14.99 subscription.
So a 10-person team pays about $55-105/month total. Per-user tools charge $240-300/month for the same team.
SpotBugs, PMD, Checkstyle, and ErrorProne are free and open source. SonarQube Community edition is also free up to 100K lines of code.
What I'd actually recommend
If you write Java for a living, you probably already run some static analysis. Keep it. Add AI review on top, not instead of.
The setup that's worked best for me:
- Checkstyle in the IDE, catches formatting while you type
- SpotBugs + PMD in CI, catches known bugs on every push
- ErrorProne as a compiler plugin, catches bugs at compile time
- AI review on pull requests, catches logic errors, missing tests, architecture issues
Static tools are the safety net. Milliseconds, zero false positives on their core rules, zero cost. AI is the second pair of eyes. Slower, sometimes wrong, but it catches the stuff that only a human reviewer would normally notice.
That combination covers more ground than either approach alone. And nobody has to spend 45 minutes reading a 400-line PR line by line.
Tired of slow code reviews? AI catches issues in seconds. You decide what gets published.
Frequently Asked Questions
Try it on your next PR
AI reviews your code for bugs, security issues, and logic errors. You approve what gets published.
Free: 10 AI reviews/day, 1 repo. No credit card.
Related Articles
AI Code Review Pricing Comparison 2026: Real Costs for Teams of 5-50
We calculated real monthly costs for 6 AI code review tools at team sizes of 5, 10, 20, and 50. Per-user pricing vs flat rate vs BYOK. Hidden costs included: API overages, per-seat scaling, self-hosted infrastructure.
How to Use Claude Code for AI Code Reviews in VS Code
Claude Code is the most-loved AI coding tool. Here's how to use it for code reviews — the manual way, the automated way with Git AutoReview, and when each approach makes sense.
Deep Review: AI That Explores Your Entire Codebase Before Reviewing Your PR
Most AI code review tools only scan the diff. Deep Review reads your full project — files, configs, tests, dependencies — and catches cross-file bugs that diff-only tools miss. Here's how it works.
Get the AI Code Review Checklist
25 traps that slip through PR review — with code examples. Plus weekly code review tips.
Unsubscribe anytime. We respect your inbox.