Improving graphics performance in Kaboom3000

tga, 01/15/2023

Use in-place matrix math

performance gain: 1.2x

Due to the batched renderer kaboom is using, all vertex transformation are done on the CPU. Kaboom was using this style of matrix math:

class Mat4 {
    static translate(x, y) {
        return new Mat4([1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, x, y, 0, 1]);
    }
    translate(x, y) {
        return this.mult(Mat4.translate(x, y));
    }
}

This is inefficient because:

it’s allocating a new Mat4 on every transform
it’s doing a full matrix multiplication on every transform

while in fact we only need to in-place change few fields while doing these types of transforms, which gets rid of the extra allocation and unused calculations.

class Mat4 {
    translate(x, y) {
        this.m[12] += this.m[0] * x + this.m[4] * y;
        this.m[13] += this.m[1] * x + this.m[5] * y;
        this.m[14] += this.m[2] * x + this.m[6] * y;
        this.m[15] += this.m[3] * x + this.m[7] * y;
        return this;
    }
}

Getting rid of spread operators

performance gain: 2x

Kaboom uses a lot of spread operators to forward drawing options to other draw functions, for example

function drawSprite(opts) {
    // doing some calculations above, then forward
    drawTexture({
        ...opts,
        tex: spr.data.tex,
        quad: q.scale(opt.quad || new Quad(0, 0, 1, 1)),
    }))
}

However we found spread operators are extremely slow if called thousands times per frame, changing every spread operator to Object.assign() ups the performance by 2x. Easiest performance gain ever.

function drawSprite(opts) {
    drawTexture(
        Object.assign(opt, {
            tex: spr.data.tex,
            quad: q.scale(opt.quad ?? new Quad(0, 0, 1, 1)),
        }),
    );
}

Automatically batch textures

performance gain: up to 50x

It’s expensive to initiate a draw call (gl.drawElements(), gl.drawArrays() etc.) in WebGL. Kaboom uses a batched renderer that keeps all shape vertices data in a buffer and only initiates a draw call at frame end or when texture changes. This approach makes it fast to draw a lot of 2d shapes with the same texture, however when texture changes a lot it can be slower than the naive render approach.

Consider this example:

kaboom();

loadSprite("bean", "sprites/bean.png");

for (let i = 0; i < 5000; i++) {
    add([
        sprite("bean"),
        pos(rand(0, width()), rand(0, height())),
        anchor("center"),
    ]);
}

It renders 5000 sprites. After

However, if you draw the same amount of sprites but alternate between 2 sprites, this example will crash your browser tab:

kaboom();

loadSprite("bean", "sprites/bean.png");
loadSprite("bag", "sprites/bag.png");

for (let i = 0; i < 5000; i++) {
    add([
        sprite(i % 2 === 0 ? "bean" : "bag"),
        pos(rand(0, width()), rand(0, height())),
        anchor("center"),
    ]);
}

In v3000 Kaboom introduced a machanism to automatically batch all sprites to a large texture atlas when loaded. As a result, the later example have the exact same performance with the former one.