Skip to content

Search

ESC
CI/CD pipeline visualization with AI integration

Automated Code Review with AI

T
by Tomáš
5 min read

TL;DR

AI code review catches pattern-level issues that linters miss — security vulnerabilities, performance antipatterns, and logic errors — and integrates directly into your pull request workflow.

Code review is one of the highest-leverage activities in software development — and one of the most time-consuming. AI-powered code review does not replace human reviewers, but it handles the mechanical checks that slow them down: security vulnerabilities, performance antipatterns, style violations, and common logic errors.

What AI Code Review Actually Catches

Traditional linters check syntax and formatting. AI code review operates at a higher level — analyzing patterns, data flow, and intent. The categories of issues it handles well include:

  • Security vulnerabilities — SQL injection, XSS, hardcoded secrets, insecure deserialization
  • Performance antipatterns — N+1 queries, unnecessary re-renders, blocking I/O in async contexts
  • Logic errors — off-by-one errors, null reference risks, unreachable code paths
  • API misuse — incorrect method signatures, deprecated function calls, missing error handling

AI code review works best as a first pass. It surfaces potential issues for human reviewers to evaluate, reducing the time humans spend on mechanical checks and letting them focus on architecture and business logic.

What It Does Not Catch

AI reviewers struggle with:

  • Architecture-level decisions
  • Business logic correctness
  • Performance implications that require runtime profiling
  • Subtle concurrency bugs in complex distributed systems

Setting Up AI Review in CI/CD

The most effective integration point is your pull request workflow. The AI reviewer runs automatically on every PR, posts comments inline, and blocks merging only for critical issues.

GitHub Actions Configuration

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read
  pull-requests: write

jobs:
  ai-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get changed files
        id: diff
        run: |
          echo "files=$(git diff --name-only origin/${{ github.base_ref }}...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT

      - name: Run AI review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          node scripts/ai-review.js \
            --files "${{ steps.diff.outputs.files }}" \
            --pr ${{ github.event.pull_request.number }}

The Review Script

The review script reads the diff, sends it to the model with a structured prompt, and posts results as PR comments.

import Anthropic from "@anthropic-ai/sdk";
import { Octokit } from "@octokit/rest";

interface ReviewIssue {
  file: string;
  line: number;
  severity: "info" | "warning" | "critical";
  message: string;
  suggestion: string;
}

async function reviewDiff(diff: string): Promise<ReviewIssue[]> {
  const client = new Anthropic();

  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 4096,
    system: `You are a senior code reviewer. Analyze the diff and identify issues.
Return a JSON array of issues. Only flag genuine problems — no style nitpicks.
Each issue must have: file, line, severity, message, suggestion.`,
    messages: [{ role: "user", content: diff }],
  });

  return JSON.parse(response.content[0].text);
}

async function postReviewComments(
  octokit: Octokit,
  prNumber: number,
  issues: ReviewIssue[]
): Promise<void> {
  for (const issue of issues) {
    await octokit.pulls.createReviewComment({
      owner: process.env.REPO_OWNER!,
      repo: process.env.REPO_NAME!,
      pull_number: prNumber,
      body: `**${issue.severity.toUpperCase()}**: ${issue.message}\n\n**Suggestion**: ${issue.suggestion}`,
      path: issue.file,
      line: issue.line,
      side: "RIGHT",
    });
  }
}

Comparing AI Code Review Tools

Several tools offer AI-powered code review with different trade-offs.

ToolModelIntegrationStrengthsPricing
Custom (API)Any LLMFull controlCustomizable prompts, no vendor lock-inAPI token cost
CodeRabbitMultipleGitHub, GitLabAutomatic summaries, inline commentsFree tier available
SourceryProprietaryGitHub, IDEPython-focused, refactoring suggestionsPer-seat license
Amazon CodeGuruProprietaryAWS ecosystemJava/Python, runtime profilingPer-line scanned

Choosing the Right Approach

For teams that need full control over the review prompt and model selection, a custom integration using the API approach above is the most flexible option. For teams that want quick setup with minimal maintenance, a managed tool like CodeRabbit provides a solid default.

Tuning for Your Codebase

Generic AI review produces too many false positives. To make it useful, you need to tune the system prompt with your project’s conventions.

const systemPrompt = `You are reviewing code for a TypeScript monorepo.

Project conventions:
- Error handling: Always use Result<T, E> types, never throw exceptions
- Database: All queries go through the repository layer, never direct DB access
- Auth: JWT tokens validated via middleware, never in route handlers
- Logging: Use structured logging with correlation IDs

Only flag violations of these conventions and genuine bugs.
Do not flag style preferences or formatting issues.`;

Investing time in a project-specific system prompt reduces false positives by 40–60%. Review and update it as your conventions evolve.

Measuring Effectiveness

Track these metrics to evaluate whether AI review is providing value:

  • True positive rate — percentage of flagged issues that humans confirm as genuine
  • Time to first review — how quickly the PR gets initial feedback
  • Human review time — whether human reviewers spend less time per PR
  • Issue escape rate — whether bugs that reach production decrease over time

A well-tuned AI review pipeline should achieve a true positive rate above 70% and reduce average human review time by 20–30%.

FAQ

Can AI replace human code reviewers?

No. AI catches mechanical issues and pattern violations efficiently, but human reviewers are essential for evaluating architecture decisions, business logic correctness, and code maintainability. The most effective setup uses AI as a first pass that handles routine checks, freeing human reviewers to focus on higher-level concerns.

How accurate is AI code review?

Modern AI code review tools achieve 70–85% accuracy on common patterns like security vulnerabilities and API misuse. They work best as a first-pass filter that surfaces potential issues for human reviewers to evaluate. Accuracy improves significantly when the system prompt includes project-specific conventions and constraints.

Share

Comments