WebGL without a GPU — Microlink Blog

Source: https://microlink.io/blog/webgl-without-a-gpu

How one chrome flag makes 3D pages render 4× faster

June 29, 2026 (2 days ago)

WebGL is everywhere now: 3D maps, seat charts, product configurators, shader-art landing pages. It was also the slowest thing you could ask Microlink to screenshot. One Chrome flag fixed that.

A WebGL page (three.js) captured as an animated screenshot, rendered through Mesa llvmpipe on a GPU-less node

**TL;DR**

- Our servers have no GPU. WebGL still has to render somewhere.

- Chrome's default software path (SwiftShader) took **~24s** per 3D page.

- Pointing ANGLE at Mesa llvmpipe (`--use-angle=gl`) dropped it to **~6s**.

- The one-line flag is the easy part. The display, the from-source Mesa, and proving it stays on the fast path are the rest of the story.

No GPU, on purpose

Our browser fleet runs on commodity Linux nodes with no graphics card and no `/dev/dri`. Cheaper, simpler, fewer drivers to babysit. But WebGL is a GPU API, so without one, something has to emulate it on the CPU. _Which_ something was the difference between a 24-second screenshot and a 6-second one.

Chrome doesn't render WebGL. ANGLE does.

Chrome hands WebGL to **ANGLE**, which translates it to whatever backend the platform has: Direct3D, Metal, native OpenGL or Vulkan, or a software renderer when there's no GPU.

With no GPU, that software renderer is the whole ballgame. Chrome ships two: **SwiftShader**, its bundled default, or the system OpenGL stack, which on our Linux nodes is **Mesa llvmpipe**. Same pixels on the CPU, wildly different speed.

Why the gap is 4×

**SwiftShader** emulates the whole pipeline conservatively, optimizing for "draws correctly anywhere." A heavy 3D scene takes ~24s; the 2D pages next to it, 2-3s.

**llvmpipe** is built differently:

- **It JITs to native code.** LLVM compiles the live shader and GL state into real x86-64. No interpreter loop.

- **It is tiled and multi-threaded.** It actually uses every core.

Several times faster, same output.

The change: One line

```

- '--use-angle=swiftshader',

+ '--use-angle=gl', ```

What you must **not** add back:

- `--disable-gpu` silently forces SwiftShader on again. It is the most-copied flag in every headless tutorial.

- `--in-process-gpu` kills the GL surface ANGLE needs.

The catch: You need a display

`--use-angle=gl` has to bind a GL surface, which needs an X display, even headless. No display, and WebGL **silently degrades to a flat 2D fallback**: the screenshot still succeeds, the request still returns `200`, and the output is wrong but plausible.

So every container boots a virtual display (Xvfb) before Chrome starts, with `LIBGL_ALWAYS_SOFTWARE=1` pinning Mesa to llvmpipe.

Why Mesa is built from source

Ubuntu jammy's Mesa is too old for this, and the PPAs that used to backport it are gone. So the base image compiles its own:

``` meson setup build \ -Dbuildtype=release -Dgallium-drivers=llvmpipe -Dvulkan-drivers= \ -Dllvm=enabled -Dshared-llvm=enabled ```

llvmpipe only, no Vulkan, **shared LLVM** (that's where the JIT speed lives). The build toolchain is huge (LLVM, clang, Rust, ~160 `-dev` packages), so the Dockerfile is multi-stage: compile Mesa, then `COPY` only the artifacts into a clean image. **2.65GB instead of 4.5GB.**

Proving it: browserless.report()

You can't tell which renderer a node uses by looking at it. `apt list` lies (we side-load Mesa over the package), and the real answer lives inside the page. So browserless.report() asks the live GL context directly:

``` const browserless = require('browserless') const report = await browserless.report() console.log(report) ```

{

browser:{

name:

"Chrome"

version:

"150"

headless:

true

build:

"chrome-for-testing"

customArguments:[

...

]

environment:{

virtualized:

true

container:

true

os:{

platform:

"linux"

distro:

"Ubuntu 22.04.5 LTS"

cpu:{

model:

"DO-Premium-Intel"

cores:

threads:

speed:

2100

arch:

"x64"

flags:[

...

]

memory:{

total:

16775352320

gpu:{

vendor:

"Mesa"

device:

"llvmpipe"

type:

"software"

graphics:{

...

mesa:

"26.1.3"

llvm:

"19.1.7"

simdWidth:

256

webgl:{

...

webgpu:{

...

}

`browserless.report()` from a production node. Expand `gpu` and `cpu` for the full picture.

The `gpu` block is the whole story:

- **`type`** is `software` / `llvmpipe` here. `swiftshader` would mean we fell back; `hardware` would mean a GPU appeared.

- **`mesa`** is read from the loaded `libgallium-<ver>.so`, not dpkg, which reports the stale package version under our side-load.

- **`simdWidth: 256`** means llvmpipe is using AVX2, which is most of why it's fast.

`report({ benchmark: true })` adds a deterministic shader benchmark (~300ms on llvmpipe) for comparing nodes.

That report is also the CI gate

The flat-2D fallback is dangerous because it looks like success. So CI asserts on `report()`: `gpu.type` must be `software`, `gpu.device` must be `llvmpipe`. Any drift fails the build instead of shipping flat 3D. The same call runs against production pods.

Most of the work was benchmarking, not coding

The diff is one line. Proving it was the _right_ line took weeks of measurement, mostly fighting two traps:

- **Dev machines lie.** A real GPU renders pages that come back black on prod. Every number had to come from prod-shaped hardware.

- **Single runs lie.** Cold JIT, first-paint races, shared cores. The fastest-looking result was sometimes the _wrong_ one: the flat fallback shipped ~1s quicker.

That's why the deterministic benchmark exists: fixed shader, forced frames, one stable number. With it, the comparison stopped being anecdotal. SwiftShader landed at ~24-31s; llvmpipe at ~6s warm and correct.

The numbers

Same 3D chart, same GPU-less hardware, measured on production:

| | SwiftShader (before) | Mesa llvmpipe (after) | | --- | --- | --- | | Render time (isolated) | ~24s | **~6s (~4×)** | | Render time (under load) | ~24s | **7–14s (~2×)** | | Failed requests | timed out → errors | none | | Active renderer | SwiftShader | llvmpipe (asserted in CI) |

Isolated, the chart finishes in ~6s. Under real traffic, where captures share cores, expect ~2×. Either way, the requests that used to time out now finish.

The following examples show how to use the Microlink API with CLI, cURL, JavaScript, Python, Ruby, PHP & Golang, targeting 'https://threejs.org/examples/webgl_animation_skinning_blending' URL with 'screenshot' API parameter:

CLI Microlink API example

`microlink https://threejs.org/examples/webgl_animation_skinning_blending&screenshot.animated`

cURL Microlink API example

``` curl -G "https://api.microlink.io" \ -d "url=https://threejs.org/examples/webgl_animation_skinning_blending" \ -d "screenshot.animated=true" ```

JavaScript Microlink API example

``` import mql from '@microlink/mql'

const { data } = await mql('https://threejs.org/examples/webgl_animation_skinning_blending', { screenshot: { animated: true } }) ```

Python Microlink API example

``` import requests

url = "https://api.microlink.io/"

querystring = { "url": "https://threejs.org/examples/webgl_animation_skinning_blending", "screenshot.animated": "true" }

response = requests.get(url, params=querystring)

print(response.json()) ```

Ruby Microlink API example

``` require 'uri' require 'net/http'

base_url = "https://api.microlink.io/"

params = { url: "https://threejs.org/examples/webgl_animation_skinning_blending", screenshot.animated: "true" }

uri = URI(base_url) uri.query = URI.encode_www_form(params)

http = Net::HTTP.new(uri.host, uri.port) http.use_ssl = true

request = Net::HTTP::Get.new(uri) response = http.request(request)

puts response.body ```

PHP Microlink API example

``` <?php

$baseUrl = "https://api.microlink.io/";

$params = [ "url" => "https://threejs.org/examples/webgl_animation_skinning_blending", "screenshot.animated" => "true" ];

$query = http_build_query($params); $url = $baseUrl . '?' . $query;

$curl = curl_init();

curl_setopt_array($curl, [ CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_ENCODING => "", CURLOPT_MAXREDIRS => 10, CURLOPT_TIMEOUT => 30, CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1, CURLOPT_CUSTOMREQUEST => "GET" ]);

$response = curl_exec($curl); $err = curl_error($curl);

curl_close($curl);

if ($err) { echo "cURL Error #: " . $err; } else { echo $response; } ```

Golang Microlink API example

``` package main

import ( "fmt" "net/http" "net/url" "io" )

func main() { baseURL := "https://api.microlink.io"

u, err := url.Parse(baseURL) if err != nil { panic(err) } q := u.Query() q.Set("url", "https://threejs.org/examples/webgl_animation_skinning_blending") q.Set("screenshot.animated", "true") u.RawQuery = q.Encode()

req, err := http.NewRequest("GET", u.String(), nil) if err != nil { panic(err) }

client := &http.Client{} resp, err := client.Do(req) if err != nil { panic(err) } defer resp.Body.Close()

body, err := io.ReadAll(resp.Body) if err != nil { panic(err) }

fmt.Println(string(body)) } ```

``` import mql from '@microlink/mql'

const { data } = await mql('https://threejs.org/examples/webgl_animation_skinning_blending', { screenshot: { animated: true } }) ```

The honest limit

Software GL closes most of the gap, not all. Heavy fragment-shader heroes can still come back black, because the canvas hasn't painted by capture time. That's a first-paint race, not a renderer problem, and no flag fixes it. The real fixes are gating capture on first paint, or real GPUs. We're doing the first.

For everything else, moving from SwiftShader to llvmpipe turned our slowest, flakiest requests into ordinary ones.