NewView the LLM Leaderboard for Code Quality, Complexity, and Security

State of Code report series

The Coding Personalities of Leading LLMs

Make smarter AI adoption decisions with Sonar's latest report in The State of Code series. Explore the habits, blind spots, and archetypes of the top five LLMs to uncover the critical risks each brings to your codebase.

Download the report Executive summary

Key findings

Our deep analysis of LLM-generated code goes beyond standard benchmarks.

Coding personalities

Each LLM has a distinct style that impacts your production environment.

Shared strengths

All models consistently produce valid and create viable solutions for well-defined problems.

Shared blind spots

All models have a fundamental lack of security awareness and a bias for messy code.

Upgrades increase risk

Newer models can generate bugs that are almost twice as likely to be of the highest severity.

What our analysis uncovered

more likely for new Claude model to be of 'BLOCKER' severity than its predecessor.

of all issues found in LLM-generated code create long-term technical debt.

of the vulnerabilities for one LLM are of ‘BLOCKER’ severity.

of all bugs from one popular LLM are control-flow mistakes.

Methodology

Our analysis is based on 4,442 identical programming tasks performed by each LLM. We measured their output across multiple dimensions to create a comprehensive profile of each model's coding personality and risk profile.

Verbosity Measurement

Verbosity quantifies the sheer volume of code each model generates to solve identical tasks.

Lines of Code (LOC)
Total number of lines of code generated across all 4,442 tasks, including blank lines and comments. This metric reveals whether a model tends toward concise or elaborate implementations.
Token Count
Total tokens generated in the code output, providing a language-agnostic measure of code volume that accounts for the actual content density.
Code Density
Ratio of executable statements to total lines, indicating how compact or spread out the code structure is.

The coding archetypes of leading LLMs

Our analysis shows that each LLM has a unique and measurable coding personality. Which one have you "hired" for your team?

GPT-5-minimal

The baseline performer

Strong performance with traditional risk profile, but generates the most verbose and complex code.

This is the entry-level reasoning mode. It delivers strong performance that is superior to most non-reasoning models. Its personality is defined by having a more "traditional" risk profile compared to more advanced models.

It produces common and well-understood flaws, such as a significant rate of "Path-traversal & Injection" vulnerabilities (20%) and basic "Control-flow mistake" bugs.

At the same time, it introduces a new class of risk with its high verbosity and complexity, leading to the highest proportion of CRITICAL code smells of any model.

The "trust but verify" mandate for AI

AI is now a core part of software development, but performance benchmarks alone are misleading. They can lead to LLMs that solve difficult challenges but fail to write good, secure, and reliable code. To harness these powerful models responsibly, you must look beyond the benchmark. Our report provides the critical insights needed to choose the right models and use them safely.

Download the report

The three qualities of software source code

Sonar classifies the issues found in every project or codebase across three deeply interconnected software qualities: reliability, security, and maintainability.

Reliability

Bugs that would affect the software's capability to maintain its level of performance under promised conditions, potentially compromising its reliability and operational effectiveness.

Advanced Security demo video play_icon

Security

Vulnerabilities and security hotspots. Vulnerabilities are code weaknesses that could be exploited for attacks, while hotspots are security-sensitive code requiring manual review.

SonarQube demo video play_icon

Maintainability

Code smells, which could indicate weaknesses in design that can increase technical debt, slow down development, or increase the risk of bugs or failures down the line.

AI Code Assurance demo video play_icon

Security Vulnerability Analysis

Security vulnerabilities in AI-generated code pose significant risks. Our analysis reveals distinct patterns in how each LLM handles security-critical code, with some models producing vulnerabilities of BLOCKER severity at alarming rates.

Severity Distribution

Newer models show alarming increases in BLOCKER-severity vulnerabilities, with some models producing critical security flaws in over 70% of cases.

Common Vulnerability Types

Injection flaws, path-traversal, and insecure cryptography are prevalent across all models, indicating fundamental gaps in security awareness.

Model Upgrades = Higher Risk

Claude Sonnet 4 is 93% more likely to produce BLOCKER vulnerabilities compared to its predecessor, showing that newer isn't always safer.

Ready to release secure, reliable, and maintainable software?

Request demo Start for free

SonarQube Cloud

SonarQube Server

SonarQube for IDE

Advanced Security

GitarNew

MCP Server

SonarSweepEarly access

Agentic Analysis

Context Augmentation

Remediation Agent

SonarQube Cloud

SonarQube Server

SonarQube for IDE

Advanced Security

GitarNew

MCP Server

SonarSweepEarly access

Agentic Analysis

Context Augmentation

Remediation Agent

AI code quality

Developer-led security

Automated code review

Platform engineering

Compliance & reporting

SDLC governance

Secrets detection

Supply chain security

All use cases

Agent Centric Development Cycle (ACDC)

AI solutions

Architecture management

Security solutions

Code quality solutions

ROI calculator

LLM leaderboard

SonarQube vs GitHub Code Quality

Healthcare

Financial services

Retail

Federal government

Our customers

Customer stories

AI code quality

Developer-led security

Automated code review

Platform engineering

Compliance & reporting

SDLC governance

Secrets detection

Supply chain security

All use cases

Agent Centric Development Cycle (ACDC)

AI solutions

Architecture management

Security solutions

Code quality solutions

ROI calculator

LLM leaderboard

SonarQube vs GitHub Code Quality

Healthcare

Financial services

Retail

Federal government

Our customers

Customer stories

Developer hub

Learning center

Commitment to open source

Community

Developer guides

SonarQube Server

SonarQube Cloud

SonarQube for IDE

Sonar Vulnerability database

GitHub

Bitbucket

Azure DevOps

GitLab

See all