Show HN: Running Gemma-4 26B at 124 tokens/SEC on a CPU, no GPU

10 points · 1 visible top comment · 2026-06-30 12:17:31 UTC

I wanted to know how fast a 26B mixture-of-experts model could run on a desktop CPU with no GPU. Got ~40 tok/s single-stream (lossless) and ~124 batched. The surprising part was the byte budget: for this model you compress the output head (32% of per-token bytes), not the experts (16%). The writeup has the bandwidth roofline and the dead-ends; the repo has the reproducible recipe. Happy to answer questions.

Repo: https://github.com/arun-prasath2005/gemma4-cpu-moe

Show HN: Running Gemma-4 26B at 124 tokens/SEC on a CPU, no GPU

Comments