p.enthalabs

How much of the Linux kernel is written by AI?

How much of the Linux kernel is written by AI?

Assisted-by:|Linux mainline · since 2026-01-01

How much of the Linux kernel is written by AI?

Loading data…

In context

click any square to zoom in

← zoom out

Models in the kernel

by lines added by commits

By vendor

who trained the model

By tool

how the model was invoked

Per-vendor model breakdown

merged tags only

Notable

two patches worth pointing at

Activity

lines added/removed commits

Top authors

who wrote the patch

Top committers

who landed the patch

Recent commits

How this was made

methodology, sources, disclosure

Merged side

Shallow clone of torvalds/linux with --shallow-since="2026-01-01", then git log --all --grep="Assisted-by:" -i. Each merged commit and each Assisted-by: line counted directly. Denominator is git log --since=2026-01-01 --oneline | wc -l on the same clone.

Submitted side

Submissions to lkml and subsystem lists pulled with lei from lore.kernel.org/all: lei q -d mid -f mboxrd 'b:"Assisted-by:" AND d:20260101..'. Replies, cover letters (0/N), and bot accounts (Patchwork, kernel test robot, syzbot, 0day) are dropped. Patch series respins (v1, v2, v3) collapse to one entry per (canonical subject, sender). The Assisted-by: line must appear in non-quoted body text.

Vendor / model / tool buckets

Each tag string is parsed into {vendor, model, tool}. Claude variants (Opus 4.5, 4.6, 4.7; Sonnet 4.5, 4.6) collapse cleanly. Wrappers (Claude Code, Cursor, GitHub Copilot, OpenCode, Kiro, Cody) attribute the model to its actual lab; the wrapper appears under "By tool" instead. Tags that do not parse (free-text, joke names, FOO:BAR.baz) bucket as Unknown.

What the page does not show

Merge rate (different humans use different tools for different patch types; the ratio is not a model quality signal). Submitted-versus-merged percentages (same reason). Authorial intent or motivation behind any tag string. Patches that landed without disclosure (this page measures policy compliance, not actual AI usage). Share of kernel-wide lines: GitHub's stats API refuses repos over 10k commits and computing diffstats from the partial clone would take hours; the AI-side line counts stand on their own.

Reproducibility

Source on GitHub: snek-git/assisted-by. Three small Python scripts: parse_commits.py (merged side), parse_lei.py (submitted side), build_data.py (assembles the JSON payload). refresh.sh runs the full pipeline end to end.

About this page

Built with assistance from Anthropic's Claude (Opus 4.7, via Claude Code) by a human collaborator. Tag string normalisation choices are judgement calls; the parser source on GitHub is the authoritative answer for any "why was X bucketed as Y?" question. Where contributors have publicly explained their setup (e.g., Greg KH's local LLM fuzzer rig), the page links the source; where they have not, it does not guess at intent.

Source: torvalds/linux mainline as of . Code: github.com/snek-git/assisted-by.

git log --grep="Assisted-by:" --since="2026-01-01"