State of Code report series

The Coding Personalities of Leading LLMs

Make smarter AI adoption decisions with Sonar's latest report in The State of Code series. Explore the habits, blind spots, and archetypes of the top five LLMs to uncover the critical risks each brings to your codebase.

Key findings

Our deep analysis of LLM-generated code goes beyond standard benchmarks.

developer

Coding personalities

Each LLM has a distinct style that impacts your production environment.

handshake

Shared strengths

All models consistently produce valid and create viable solutions for well-defined problems.

warning

Shared blind spots

All models have a fundamental lack of security awareness and a bias for messy code.

arrows pointing up on a diagonal

Upgrades increase risk

Newer models can generate bugs that are almost twice as likely to be of the highest severity.

What our analysis uncovered

%
more likely for new Claude model to be of 'BLOCKER' severity than its predecessor.
%
of all issues found in LLM-generated code create long-term technical debt.
%
of the vulnerabilities for one LLM are of ‘BLOCKER’ severity.
%
of all bugs from one popular LLM are control-flow mistakes.

Methodology

Our analysis is based on 4,442 identical programming tasks performed by each LLM. We measured their output across multiple dimensions to create a comprehensive profile of each model's coding personality and risk profile.

feedback

Verbosity Measurement

Verbosity quantifies the sheer volume of code each model generates to solve identical tasks.

  • Lines of Code (LOC)
    Total number of lines of code generated across all 4,442 tasks, including blank lines and comments. This metric reveals whether a model tends toward concise or elaborate implementations.
  • Token Count
    Total tokens generated in the code output, providing a language-agnostic measure of code volume that accounts for the actual content density.
  • Code Density
    Ratio of executable statements to total lines, indicating how compact or spread out the code structure is.

The coding archetypes of leading LLMs

Our analysis shows that each LLM has a unique and measurable coding personality. Which one have you "hired" for your team?

The "trust but verify" mandate for AI

AI is now a core part of software development, but performance benchmarks alone are misleading. They can lead to LLMs that solve difficult challenges but fail to write good, secure, and reliable code. To harness these powerful models responsibly, you must look beyond the benchmark. Our report provides the critical insights needed to choose the right models and use them safely.

The three qualities of software source code

Sonar classifies the issues found in every project or codebase across three deeply interconnected software qualities: reliability, security, and maintainability.

smily

Reliability

Bugs that would affect the software's capability to maintain its level of performance under promised conditions, potentially compromising its reliability and operational effectiveness.

Advanced Security demovideo play_icon
lock

Security

Vulnerabilities and security hotspots. Vulnerabilities are code weaknesses that could be exploited for attacks, while hotspots are security-sensitive code requiring manual review.

SonarQube demovideo play_icon
code

Maintainability

Code smells, which could indicate weaknesses in design that can increase technical debt, slow down development, or increase the risk of bugs or failures down the line.

AI Code Assurance demovideo play_icon

Security Vulnerability Analysis

Security vulnerabilities in AI-generated code pose significant risks. Our analysis reveals distinct patterns in how each LLM handles security-critical code, with some models producing vulnerabilities of BLOCKER severity at alarming rates.

warning

Severity Distribution

Newer models show alarming increases in BLOCKER-severity vulnerabilities, with some models producing critical security flaws in over 70% of cases.

secure

Common Vulnerability Types

Injection flaws, path-traversal, and insecure cryptography are prevalent across all models, indicating fundamental gaps in security awareness.

arrows pointing up on a diagonal

Model Upgrades = Higher Risk

Claude Sonnet 4 is 93% more likely to produce BLOCKER vulnerabilities compared to its predecessor, showing that newer isn't always safer.

Ready to release secure, reliable, and maintainable software?


TRUSTED BY OVER 7M DEVELOPERS WORLDWIDE

Mercedes Benz
Mercedes Benz
Nvidia
Nvidia
Santander
Santander

Unsubscribe