Improving graphics performance in Kaboom3000
tga, 01/15/2023
Use in-place matrix math
performance gain: 1.2x
Due to the batched renderer kaboom is using, all vertex transformation are done on the CPU. Kaboom was using this style of matrix math:
class Mat4 {
static translate(x, y) {
return new Mat4([1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, x, y, 0, 1]);
}
translate(x, y) {
return this.mult(Mat4.translate(x, y));
}
}
This is inefficient because:
- it’s allocating a new
Mat4
on every transform - it’s doing a full matrix multiplication on every transform
while in fact we only need to in-place change few fields while doing these types of transforms, which gets rid of the extra allocation and unused calculations.
class Mat4 {
translate(x, y) {
this.m[12] += this.m[0] * x + this.m[4] * y;
this.m[13] += this.m[1] * x + this.m[5] * y;
this.m[14] += this.m[2] * x + this.m[6] * y;
this.m[15] += this.m[3] * x + this.m[7] * y;
return this;
}
}
Getting rid of spread operators
performance gain: 2x
Kaboom uses a lot of spread operators to forward drawing options to other draw functions, for example
function drawSprite(opts) {
// doing some calculations above, then forward
drawTexture({
...opts,
tex: spr.data.tex,
quad: q.scale(opt.quad || new Quad(0, 0, 1, 1)),
}))
}
However we found spread operators are extremely slow if called thousands times
per frame, changing every spread operator to Object.assign()
ups the
performance by 2x. Easiest performance gain ever.
function drawSprite(opts) {
drawTexture(
Object.assign(opt, {
tex: spr.data.tex,
quad: q.scale(opt.quad ?? new Quad(0, 0, 1, 1)),
}),
);
}
Automatically batch textures
performance gain: up to 50x
It’s expensive to initiate a draw call (gl.drawElements()
, gl.drawArrays()
etc.) in WebGL. Kaboom uses a batched renderer that keeps all shape vertices
data in a buffer and only initiates a draw call at frame end or when texture
changes. This approach makes it fast to draw a lot of 2d shapes with the same
texture, however when texture changes a lot it can be slower than the naive
render approach.
Consider this example:
kaboom();
loadSprite("bean", "sprites/bean.png");
for (let i = 0; i < 5000; i++) {
add([
sprite("bean"),
pos(rand(0, width()), rand(0, height())),
anchor("center"),
]);
}
It renders 5000 sprites. After
However, if you draw the same amount of sprites but alternate between 2 sprites, this example will crash your browser tab:
kaboom();
loadSprite("bean", "sprites/bean.png");
loadSprite("bag", "sprites/bag.png");
for (let i = 0; i < 5000; i++) {
add([
sprite(i % 2 === 0 ? "bean" : "bag"),
pos(rand(0, width()), rand(0, height())),
anchor("center"),
]);
}
In v3000 Kaboom introduced a machanism to automatically batch all sprites to a large texture atlas when loaded. As a result, the later example have the exact same performance with the former one.