Joost's Dev Blog: programming

Showing posts with label programming. Show all posts

Wednesday, 17 March 2021

Super flexible 2D animations through the Blightbound Skin Editor

Last week I discussed how we used After Effects and Duik as our animation tools for Blightbound, achieving crisp animations at high framerates with small filesizes and nice tools for our animators. The real strength of this workflow however comes from our Skin Editor. This makes it possible to reuse animations, quickly tweak the looks of characters, swap gear, add special effects and attach objects to characters convincingly. Today I’d like to explain the ideas behind our Skin Editor and how this impacted the work of our artists.

As I showed in last week’s blogpost, the basic idea behind animations in Blightbound is that a character is made up of parts (hands, torso, face, upper arms, lower arms, etc.). These parts move and rotate and the game plays that back directly, instead of using sprite sheets.

Characters in Blightbound are animated using parts, and frame-to-frame variations on those parts. This shows some of the parts used in the animations of Malborys.

Since the game knows exactly what parts there are and where they are, there are a lot of fun things we can do with this besides simply playing back animations. How cool would it be to swap parts to create new characters, to swap weapons, to attach special effects to weapons and to attach effect-over-time animations to bodyparts? To achieve all of these things our artists made sure all characters have a similar structure, and I developed our Skin Editor.

The basic idea behind our Skin Editor is that it finds all the visible parts in all the animations for a rig and allows changing or hiding them. So a skin can change whatever it wants: a character’s head, weapon, shoulder, or even everything. When an animation is played back in-game, we inform it which skins are active and those are applied to modify the character’s looks during gameplay.

A demonstration of the interface and the various features of the Blightbound skin editor.

Skins allow us to create a new character by drawing new parts, without needing to create any new animations. This was an important goal for us, because for Blightbound we wanted quite a lot of characters. A core idea behind the game is that it should be a bit like Pokémon: gotta catch em all. There are a lot of playable heroes and the player collects those during gameplay. However, for Awesomenauts it took us 8 years to get to 34 characters and here we needed more animations per character and more characters (enemies included). So the Awesomenauts workflow wasn’t going to work and we needed to apply more reuse instead. The Skin Editor made this possible.

Note that above I said we apply skins, plural. Our skin system allows combining several skins, as well as changing which skins are active on the fly. A skin can add parts, but it can also remove or replace parts from the skin below it. This is pretty cool, because it allows us to do all kinds of cool things. For example, weapons are skins, so swapping a weapon means swapping a skin while keeping the base body skin the same.

Weapons and shields are easily swapped by enabling and disabling additional skins.

Weapons are an obvious application of this, but we can do more outlandish stuff with this. If an enemy gets hit by a knife that applies a bleed effect (damage over time), then we would like to have an effect on the enemy for the duration of the bleed effect. In Awesomenauts we were heavily limited in these kinds of effects because the sprite sheet didn’t tell us where any of the limbs or feet of the character were. This meant status effects needed to be visualised with animations that are always centered on the character. In Blightbound we can do much better: we can have a knife sticking our of a leg, and the knife will move correctly with the leg. All it takes is adding a skin to the enemy that contains that knife.

Normally we put textures in skins, mostly body parts and clothing and weapons and such. But we also have a feature to attach any in-engine animation to a skin. This allows us to do things like adding sparkly particles to a staff, dripping blood to the knife, glows to the shackles, trails, and even physics capes. We also use this to attach gameplay dummies to skins, for example to make sure projectiles come out of the tip of the staff, no matter where that staff is in the current animation.

The skin system allows us to accurately attach visuals for status effects to the animation. Here the slow effect attaches to the legs. Also, the cape is a nice example of attaching a physics object to the character.

A fun but mostly unused feature I implemented is the scaling of parts. Skins can not just replace parts, but also change their size. We hoped we could use this to change a character’s silhouette, for example by giving one character longer arms and shorter legs. This feature also allows making all heads bigger (which is a common Easter egg in many games, but we didn’t add it to Blightbound).

While scaling worked fine from a technical perspective, in practice it was hard to make good looking characters this way so our artists ended up hardly using that feature. Also, a more complete implementation of limb scaling requires animation retargeting (for example to keep feet firmly on the ground when the ratio between upper and lower legs is changed), which is some pretty advanced tech that we didn’t have time to dive into.

So far it might seem like just having that skin editor enables all of these cool features. However, to make this work, all animations need to have consistent elements. For example, if we want to attach that knife to that underarm, then each animation that the character can play needs to have an underarm and it always needs to have the same name. In some cases we worked around this by adding empty visual parts to the rig, so that any characters that needed those parts could enable them. This is somewhat similar to adding attachment dummies to a skeleton in 3D animation.

Also, if we want to quickly reskin a character, then we should limit how many unique parts there are. In my previous post I mentioned that we do frame-to-frame animation on parts, for example by drawing several heads with different expressions. For every new character we add on this rig, we need to draw all of those heads. The larger the number of parts, the more work it is to add a new character that replaces all of those parts.

There are some tricks that we used to decrease the workload. For starters, our skinning system falls back to the default if variations don’t exist. If for example one character has a mask, then we don’t need to make copies of that mask for different facial expressions. If a character is less important then we can also just choose to not draw the facial expressions for that character, even if they wouldn’t be hidden by a mask.

For each character we need to draw all the parts used in all their animations. For four characters in Blightbound this shows all the heads needed for the female rig. Note that some characters have one head fewer. That's okay because in such cases our skin system automatically falls back to the default.

Another thing we used is that not all characters use all animations. Enemies for example share some animations with heroes, but their animation set is a lot smaller. Enemies therefore only need to have the parts that are actually used in their animation set. For this same reason our animators had a lot more flexibility in adding parts whenever they made a character-specific animation: none of the other characters need to have those parts.

By carefully choosing which parts and variations are really needed, our artists managed to keep it all doable. Still, in total Blightbound currently has over 4,600 character parts. Together these form around 50 characters (players and enemies), 104 weapons and 45 shields.

This did require a lot of experimenting with exactly how a body is made up. While the basic parts may seem straightforward, there are a lot of subtle choices to be made. For example, feet are split in the front part (including the toes) and the back part, and the pelvis and torso are split. Also, some parts are in the rig twice: once in the front and once in the back, for more clothing possibilities.

Our artists spent a lot of time figuring out exactly how the rig should be split into parts to be able to do a lot of poses and clothing variations without introducing too many parts. For example, here we see that the belt is split into 3 parts and there are "waistflaps" in the rig.

This brings us to the biggest downside of this approach: it adds a lot of limitations to what our animators can do. For each animation they need to creatively reuse the existing parts as much as possible. They can add new parts when it’s really needed, but they need to be very conservative with this to avoid bloating the workload. The result is that animations are a bit less dynamic that in Awesomenauts and Swords & Soldiers 2. Also, our artists generally did not enjoy the extra challenge of these limitations.

Another downside is that reusing animations makes characters a lot more similar visually, and limits what we can do in terms of body types. Our artists did go out of their way to make the characters as visually unique as possible within the limitations of the rig, for example by adding variation through extreme headgear, capes and clothing.

To make characters feel more unique, our artists have made select unique animations for each hero. Especially the idle animations are different, but in some cases also the walk animations. That’s one of the nice things of reuse: during development you quickly have a full character in the game with all animations, and then later on you can replace specific animations to make them more unique.

To make characters that use the same rig less samey, most have a unique idle animation.

Animation reuse is also great for prototyping: if for example a designer wants to try swapping skills around between enemies, they get an animation set for free to get a better idea of what gameplay feel that will have. In Awesomenauts on the other hand that often required an artist to make some extra concept frames for skill visualisation.

Before I finish this blogpost I would like to point out that the Skin Editor and the limitations we applied to be able to reuse animations are not necessary when using After Effects and Duik for animations. It’s perfectly possible to let go of all of those limitations and still get all the benefits that I discussed in last week’s blogpost. In fact, our After Effects exporter is technically capable of exporting Awesomenauts characters without using sprite sheets.

Despite the downsides, the scope of Blightbound would not have been doable for us without the approach we used. The combination of the Skin Editor and the consistent body setup that our artists applied to all animations made a lot of cool things possible for Blightbound. We were able to make more characters then we could otherwise have, create swappable gear, add special effects to weapons and attach animations to body parts. In total, I think the result is pretty spectacular. Also, playing around with the Skin Editor is just plain fun!

To conclude, here’s a video that shows what happens when you repeatedly randomise all parts of a character:

The Blightbound dev tools contain a feature to randomise skins. The results look both horrible and hilarious.

Special thanks to Ronimo artists Koen Gabriels, Tim Scheel and Gijs Hermans for providing feedback on this article.

Wednesday, 10 March 2021

Finding a suitable toolchain for animating Blightbound’s 2D characters

For Blightbound our technical ambitions for the 2D animation system were quite lofty: we wanted high quality animation, screen-filling bosses, crisp character art, high framerate and swappable gear. This required a complete rework of our animation pipeline, since the sprite sheets we used in Awesomenauts and Swords & Soldiers 2 allow for none of those requirements, except for high quality animations. In today’s blogpost I’d like to explain the problems we faced, and what combination of tools we chose to solve them. This is the first half of a two-parter: next week I’ll discuss our skin editor and its implications on animation.

Blightbound features huge screen-filling bosses that need to animate smoothly without taking up insane amounts of memory.

Traditionally 2D animation in games is done using sprite sheets. A sprite sheet is simply a big texture with all the frames for all a character’s animations in it. Sprite sheets aren’t limited to just character animation: they can also store other things, like special effects or plants. The frames might be in a grid, or spaced more efficiently like we did for Awesomenauts. In any case, it’s just a series of tightly packed images.

Sprite sheets are great because they allow for complete freedom. The animator can draw anything they like and it will just work. Extreme squash-and-stretch? Perspective changes? Morphs? Do as you please, to the game it’s all just images!

However, sprite sheets come with a few huge downsides. First is size. If you have a big character and want it to be crisp and high-resolution, then each frame is going to take up a lot of space. Having lots of large images is going to cost too much memory. In Awesomenauts this was indeed a problem for some of the characters. For Clunk to fit in one 4096*4096 sprite sheet (the maximum we chose for memory and compatibility reasons), we had to either lower the resolution a bit, or reduce the framerate. And that’s not even that big of a character! The high memory usage of sprite sheets means that screen-filling bosses either can’t be done, or need to have very few frames, or can’t be crisp.

In Awesomenauts, red Clunk occupies an entire 4k sprite sheet texture for all his animation frames. Clunk fit only after slightly downsizing him.

The second major downside of sprite sheets is that since it’s just images, it’s very hard to do anything with them except simply showing them. For Awesomenauts we’ve often talked about letting players customise characters by changing for example their hat or other parts, but simple sprite sheets make that incredibly hard. The engine would need to know where the hat is exactly, but the sprite sheet doesn’t contain that information. Nor does it tell us whether there are different perspectives of the hat in different frames, or whether anything is ever in front of or behind the hat.

I can think of some workarounds for this problem, but it’s all so cumbersome that’s it’s not practical to actually do. This is the main reason why Awesomenauts only contains skins that swap the entire look of a character, and no swappable hats or clothing: those would be completely new sprite sheets for each combination.

Since every skin change in Awesomenauts requires an entire new sprite sheet we decided to make full reskins instead of allowing the user to customise individual parts (like hats or weapons). These are the four looks for Ted McPain.

So, sprite sheets can’t give us four of our requirements (screen-filling bosses, crisp character art, high animation framerate and swappable items). What can we do instead? The obvious direction for a solution is to split a character in parts, and let those parts move relative to each other. Or even to go for full skeletal animation.

If you have a background in 3D animation, you might think “well skeletal animation of course, it’s awesome!” And in 3D it is indeed, but in 2D it’s a lot more limited. Because we’re restricted to the 2D plane, there’s a lot of animation that can’t be done with a standard skeleton in 2D. Rotating the head of the character away from the camera is super easy in 3D, but impossible with 2D skeletal animation. Swinging a sword vertically is easy, but horizontally is very hard because that requires 3D perspective. Wanting to circumvent these limitations, we opted for a combination of 2D skeletal animation and traditional frame-to-frame animation on parts where needed.

A common tool for doing part-based 2D animation is Spine. Spine is used by a lot of games and we expected this to be our best option. We tried Spine and it was indeed quick and easy to use, as well as easy to integrate into our own engine. However, it turned out to be a bit too simple: Spine doesn’t allow swapping a part in the middle of an animation. This is needed if you want to mix skeletal animation with frame-to-frame animation, for example by switching to a different drawing of a head or hand. The workaround was to layer several skeletons on top of each other, but that was too clunky to work with. We reached out to Spine at the time, explaining this issue, and they confirmed this was the only way. Our evaluation was three years ago and we haven't tried Spine since, but according to this post by them their options for combining frame-to-frame with skeletal animation seem to have improved recently.

Spine by Esoteric Software is a 2D character animation tool for games for creating animations using parts, skeletons and deformations (screenshot taken from one of Spine's tutorial videos).

We also tried some other tools, including Blender and 3D Studio MAX, but those turned out to be too focussed on 3D to create a good workflow for 2D animation for us. Especially the nice skeletal systems they have became quite impractical when used for 2D animation.

So instead, we wanted to stick with After Effects. After Effects is mostly known as a tool for editing video and doing special effects and such, but it’s actually also just a really good general animation tool. Our artists happily used it for character animation in our previous games Awesomenauts and Swords & Soldiers 2. The basic idea is this: each part in an animation is a layer, and can be animated, linked and deformed freely. By turning layers on and off, you can do frame-to-frame animation on parts (like a hand opening and closing) or on the character as a whole.

I imagine After Effects would be a bad fit for full hand-drawn frame-to-frame animation, like in Cuphead, since After Effects doesn’t even allow drawing on layers directly (we draw all our parts in Photoshop). However, for our games it’s an excellent tool because we don’t want to redraw the entire character every frame anyway.

There was something missing though: After Effects is a great animation tool, but it’s not really tailored to 2D character animation. Good thing we already knew about the Duik plug-in. Duik adds the rigging features that are common in 3D animation to After Effects. Combined with the strong tools After Effects already has, it’s a really effective 2D animation tool for games.

Duik is an After Effects plugin that adds tons of 2D animation features (screenshot from this tutorial video by Motion Tutorials).

I previously mentioned we wanted to mix skeletal animation with frame-to-frame animation. That part is actually really simple with Duik. The artist just links several drawings of a part to a bone and switches which is visible. This way we can change facial expressions during animations, as well as do things like open and close hands, pivot or distort a torso and switch facial expressions.

Thumbnails of most animation parts needed for one character in Blightbound. The various hands, faces and chests are for frame-to-frame animation on parts of the body.

Happy with After Effects+Duik as our animation tool, another important thing was missing: After Effects files can’t be played back in real-time engines. So I set about the task of writing an exporter from After Effects that exports all the hierarchy and animation data. I also implemented the in-game side of playing that back in real-time.

At it’s core, this is a surprisingly simple task. Each part in an After Effects animation is a layer and we can just export all the layers with their animation frames and play them back. Where I went wrong though, is that I wanted to emulate Duik’s systems in-game. I thought Duik only did basic two-bone inverse kinematics and implemented that in the engine. This worked with simple animations, but for complete characters the animations were completely broken in-game. When I looked deeper I learned that Duik actually has a ton of different animation systems and approaches. Mimicking all of those in-game was totally undoable.

Duik has so many features that only supporting parenting and two-bone IK produced completely wrong results in-game, as can be seen in this 'beautiful' monstrosity. This was intended to be a normal walk animation, early during development of Blightbound.

Once I realised this, I went for an easy alternative: animation baking. I export the orientation, scale and position of each part at 30 frames per second, simply storing the values for each frame (unless they’re not changing). This is a lot more data, but compared to the size of textures it’s really negligible. I did keep the hierarchy of the bones, so parts don’t start to subtly 'float' compared to each other.

Our custom file format is simply a text file with a long list of all parts and the information needed to position and animate them correctly, including parenting and keyframes.

I previously said we wanted high framerate. The game actually interpolates between these frames, so while they’re exported at 30fps, they’ll also run smoothly at 60fps, 144fps, or any other framerate. That’s also great for when slowing an enemy: in Awesomenauts, if you slow an enemy, you can see the low framerate of their slowed down walk animation. In Blightbound, movement remains perfectly smooth.

A note there is that whenever something needs to jump from one position to another instead of smoothly interpolating there, the artist needs to use a hold key. Our tool exports those and applies them correctly during playback. Overlooking this caused some bugs here and there, because artists aren’t used to needing to use hold keys when two frames are directly next to each other. In-game however there can be frames in between, so that case still needs a hold key.

A few animations that Ronimo animator Tim Scheel made for the Blightbound character Karrogh.

After Effects is a huge tool with tons of features, so it’s impossible to support them all in-game. Our exporter only supports what we actually need, and nothing else. This means our artists need to avoid using some After Effects features that they might want to use. In most cases this is fine, but one particular feature that we ended up not supporting due to a lack of time, is mesh deformations (also known as puppet pins). Being able to deform a part would have helped a lot, but we didn’t find the time to implement a good emulation of After Effects’ deformation features in-game. Quite a pity, and I wonder whether our artists have ever wished they had chosen Spine after all, since Spine supports deformations in-game out-of-the-box.

Our artists did apply a simple workaround for the lack of deformations. By doing them in Photoshop and saving them as different frames for a part, they work fine. Technically the engine then thinks it’s frame-to-frame animation, but it’s actually deformed instead of redrawn in Photoshop. This approach wasn’t used much because it requires a lot of handwork, especially when reusing animations on several characters, but it does get the job done when needed.

The chest deformation in this animation from Blightbound was made by deforming the chest part in Photoshop and then exporting it as separate parts.

From a memory perspective the result of our toolchain is pretty impressive. Whereas a single Awesomenauts character can take up 16mb of texture space (excluding skins and the blue team), all Blightbound characters and enemies together use only 65mb of texture space, spread out over 4,600 character parts. On top of that, Blightbound animations are much higher resolution and higher framerate and we can swap parts and weapons dynamically.

What we have so far is strong animation tools (After Effects + Duik) and a way to play animations back in-game. This is enough to make a playable game, but we wanted considerably more: easy reuse of animations between characters, equippable gear, attaching special effects to bodyparts and efficiently handling thousands of character parts. In next week’s blogpost I'll dive into how we achieved those goals using our own skin editor.

Special thanks to Ronimo artists Koen Gabriels and Gijs Hermans for providing feedback on this article.

Sunday, 6 December 2020

Softening polygon intersections in Blightbound

Our new game Blightbound features many types of foggy effects: mist, dust, smoke and more. Such effects are often made with planes and particles, allowing us to draw fog and effect textures by hand and giving us maximum artistic control. However, one issue with this is that the place where fog planes intersect with geometry creates a hard edge, which looks very fake and outdated. My solution was to implement depth fade. This is a commonly used technique for soft particles, but we use it on lots of objects, not just on particles.

In today’s blogpost I’ll explain how depth fade rendering works. I’ll also show just how widely this technique can be applied, by going through a bunch of examples from Blightbound.

First, let’s have a look at what problem we’re trying to solve here. When putting partially transparent planes in the world, the place where they intersect with other objects creates a straight cut-off line. Sometimes that’s desired, but often those transparent planes represent volumetric effects. They’re not supposed to look like flat planes, but that’s just the easiest and most efficient way of rendering them. This is fine when there are no intersections, but when there are then the hard lines where they touch other objects break the volumetric illusion.

There’s a simple solution for this that’s used in a lot of games: depth fade. The idea is to simply fade out the plane near the intersection. This produces an effect similar to how real fog works: objects that go into the fog seem to smoothly fade out. However, actually figuring out all polygon intersections takes too much performance, so we want a rendering trick instead.

This screenshot from Blightbound shows a fog plane just above the ground. In the top image it is rendered in the standard way, resulting in hard intersections with the characters, rocks and cart. At the bottom the intersections are smoothened by depth fade.

The trick to rendering with depth fade is to first render all normal geometry, excluding any transparent objects. This fills the depth buffer, so for every pixel we know what distance it has from the camera. Then when rendering the objects that need depth fade, the pixel shader looks up the distance in the depth buffer and compares that to its own distance. If these are close to each other, then we assume that we’re near an intersection and fade out this pixel. The nearer, the stronger the fade out, until the object is entirely invisible at the point of intersection.

This technique has a few neat bonus features on top of just smoothing out intersections. By simply setting the distance over which the fade occurs, we can modify the density of the mist. Also, objects don't need to actually intersect with the fog plane to get depth fade applied. Being just beneath the fog plane also makes the effect visible. Thus depth fade is more than just a way to smoothen intersections.

The fog's density setting determines the width of the smoothing of the intersections. At a very high density the smoothing is almost lost. At a very low density the fog almost disappears because the ground is now also considered 'close' to the fog plane.

While this technique is traditionally mostly used for particles, it can easily be used for all objects with transparency. Since the world of Blightbound is covered by the blight (a thick, corrupting fog) we have a lot of types of fog in our game, including many fog planes and particles, as well as smoke and special effects. Our artists can apply depth fade rendering to all of those, not just to particles.

Depth fade is also great for hiding the seams of moving objects, like this fog wall.

A nice property of depth fade is that it doesn't cost all that much performance compared to traditional alpha blending. For each pixel of a particle or fog plane that we render, it costs one extra texture look-up (in the depth buffer) and a distance calculation. Compared to more advanced volumetric techniques, like voxel ray marching, that's a very low price. Since the performance impact of depth fade is low, our artists can use this technique on many objects, not just on the few that really, really need it.

Depth fade can also solve problems with camera facing glow planes. The glow on this torch is always oriented towards the camera, but that makes it intersect with the wall behind it under certain angles. Using depth fade, the intersection can be hidden. This animation shows alternating with and without depth fade.

When I implemented depth fade, I thought I was being pretty clever: I had only ever seen this technique used for soft particles, not for generic object rendering. However, while searching the web a bit to write this blogpost, I found out it's actually a standard feature in the Unreal engine. For Unity I only found the option on particles, but it might exist in a more generic form there as well.

Now that we know how depth fade works, let’s have a look at a bunch of example uses from Blightbound. Special thanks to my colleague Ralph Rademakers, who made most of the levels and is thus the prime user of depth fade in Blightbound. Ralph gave me a nice list of cool spots to show:

A compilation of examples from Blightbound where depth fade is used to great effect, showing both with and without depth fade.

Another application of depth fade is to hide seams of VFX with the world. In this example the smoke effect intersects with a black ground plane.

When I initially implemented depth fade in Blightbound, I thought it would mostly be used on fog planes that float just above the ground, to give the impression of heroes walking through a low hanging, milky fog. As soon as our artists got hold of this technique however they started using it on tons of other objects. This is to me one of the most fun parts of building graphics tech: seeing how much more artists can do with it than I had originally imagined!

Saturday, 24 October 2020

Bending Blightbound's world to lower the horizon

In 3D games perspective is often treated as a given; a law of nature. But it doesn’t have to be that way: with some shader trickery or clever modelling, perspective can be manipulated to achieve certain compositions that may not be realistic, but look more interesting and are still convincing to the player. One such example is how we kept the horizon on screen in our new game Blightbound by subtly bending the world.

At Ronimo we come from a world of 2D games. In 2D, composition can be whatever you like. That’s why our art director Gijs Hermans may sometimes want to ignore standard perspective rules and instead looks at what he wants to achieve visually. Thus early in development Gijs came to me and said he wanted the camera to look down quite a bit, but still have the horizon in view. In fact, he wanted the horizon to be quite a bit below the top of the screen. His reasoning was that visuals look much better when you don’t see just the floor most of the time. Het position of the horizon is an important tool for shaping a composition.

The origin of this request is a clash that often happens in game development: pretty visuals versus gameplay clarity. Our artists spend a lot of time on achieving both goals simultaneously. A very successful example of this is the way gameplay objects and backgrounds are drawn in a different style in our previous game Swords & Soldiers 2, as I described in this blogpost.

In a game where depth matters, like Blightbound, a low camera is problematic because it makes it difficult to see whether you are standing in front of an enemy or behind them. A high camera solves this, but a high camera removes the horizon from view, making the image a lot more boring.

When Gijs came to me with this request, I thought of two possible solutions: either give the camera a wider field-of-view, or bend the world to move the horizon down. We tried the easiest solution first: wide field-of-view. However, it turned out this needed to be set so wide that the entire perspective looked skewed. Extreme field-of-view often isn’t very pretty and it definitely wasn't in Blightbound.

The alternative I came up with is bending the world down the further it is from the camera. This is an effect that’s used in a bunch of games to create a sense that the world is very small, making the world feel cutesy and funny. However, Blightbound is intended to be a dark fantasy game, definitely not something cute and funny, so we didn’t want anything that extreme. I figured that with some tweaking it might be possible to achieve a more subtle version of this that still keeps the horizon in view but doesn’t have the funny vibe.

My implementation of this effect is quite simple. In the vertex shader I bend down the world depending on the Z-position of the vertex in the world. The nice thing of implementing it this way is that our gameplay code and level design tools can assume a flat world, making them a lot simpler. The bending only exists during rendering, so gameplay logic doesn't need to take it into account.

A minor challenge in implementing this bend is how to handle lighting and shadows. When the camera moves forward and the world bends, we don’t want the lighting on objects to change, since that would make the bending very obvious and would make the player focus on the backgrounds instead of on the gameplay. Also, objects in the background shouldn't be more bright because they are rotated towards the light by the bend. My solution was to calculate all lighting, shadows and fog as if there is no world bend.

Also, a little technical note: since the bend happens on the vertices, objects need to have enough vertices. A big square plane for the ground with no vertices in between can’t be bent. Occasionally this caused bugs where a small object would float above a big object because the big object didn’t have enough vertices to be bent correctly.

The bend effect is quite fun to see in action when set to an extreme value. However, any kind of geometric deformation is quite noticeable when the camera moves, so we chose to fix the bend in the world instead of letting it move with the camera.

A few different settings for the bend, including the final one used in Blightbound.

As you can see in the video, the bend effect is kept quite subtle in Blightbound. We didn’t want that cutesy/funny effect at all, since this is intended to be a dark fantasy game. Our level artist Ralph Rademakers tweaked the effect and the camera a lot until he got it to a point where it felt like there was no bend at all, just a natural camera. However, if you compare with and without bend, you can see that the bend makes a huge difference in what you actually see. And that’s exactly how it was intended: achieve the desired composition but don’t make it look like anything weird is going on.

And then came the fog! The bend effect was implemented when we hadn’t figured out the lore of the world yet. We didn’t know then that we would want to have so much fog. In fact, the working title of the game used to be AwesomeKnights instead of Blightbound! Once we finally decided on the lore we knew that the world of Blightbound is covered in “blight”, a corrupting fog. To match that, Ralph added a lot of fog to all the levels. This creates a great atmosphere, but… hides the horizon!

Does that make the world bend useless? No, definitely not. It's still used in quite a few levels to change the perspective and have a more horizontal view on the background, even if we can’t see as far as before. It’s a more subtle tool than originally intended, but still a very useful tool.

I think the bend effect we used here is a wonderful example of the kind of graphics programming I enjoy most: looking at what’s needed from an artistic standpoint, and then making tech that achieves that. I’m personally not very interested in realistic rendering: 3D is just a tool to make cool art, whatever the shape or type. The bend technique used here makes no sense whatsoever from a physical standpoint, but it adds to making Blightbound a prettier, more compelling game.

Sunday, 18 October 2020

Screen Space Reflections in Blightbound

An important focus during development of our new game Blightbound (currently in Early Access on Steam) is that we want to combine 2D character animation with high quality 3D rendering. Things like lighting, normal maps, depth of field blur, soft particles, fog and real-time shadows are used to make it all gel together and feel different from standard 2D graphics. One such effect I implemented into our engine is SSR: Screen Space Reflections. Today I’d like to explain what SSR is and what fun details I encountered while implementing it into our engine.

A compilation of places with reflections in rain puddles in Blightbound.

Reflections in games can be implemented in many ways. The most obvious way to implement reflections is through raytracing, but until recently GPUs couldn’t do this in any reasonable way, and even now that GPU raytracing exists, too few people have a computer that supports it to make it a feasible technique. This will change in the coming years, especially with the launch of Xbox Series X and Playstation 5, but for Blightbound that’s too late since we want it to look good on currently common GPUs.

So we need something else. The most commonly used techniques for implementing reflections in games are cubemaps, planar reflections and Screen Space Reflections (SSR). We mostly wanted to use reflections for puddles and such, so it’s important to us that characters standing in those puddles are actually reflected in real-time. That means that static cubemaps aren’t an option. A pity, since static cubemaps are by far the cheapest way of doing reflections. The alternative is dynamic reflections through cubemaps or planar reflections, using render-textures. These techniques are great, but require rendering large portions of the scene again to render-textures. I guessed that the additional rendercalls and fillrate would cost too much performance in our case. I decided to go for Screen Space Reflections (SSR) instead.

The basic idea behind SSR is that since you’re already rendering the scene normally, you might as well try to find what’s being reflected by looking it up in the image you already have.

SSR has one huge drawback though: it can only reflect things that are on-screen. So you can’t use a mirror to look around a corner. Nor can you reflect the sky while looking down at the ground. When you don't focus on the reflections this is rarely a problem, but once you look for it you can see some really weird artefacts in reflections due to this. For example, have a look at the reflections of the trees in this video of Far Cry 5.

So we’re looking up our reflections in the image of the scene we already have, but how do we even do that? The idea is that we cast a ray into the world, and then look up points along the ray in the texture. For this we need not just the image, but also the depth buffer. By checking the depth buffer, we can look whether the point on our ray is in front of the scene, or behind it. If one point on the ray is in front of whatever is in the image and the next is behind it, then apparently this is where the ray crossed the scene, so we use the colour at that spot. It’s a bit difficult to explain this in words, so have a look at this scheme instead:

Since SSR can cast rays in any direction, it’s very well suited for reflections on curved surfaces and accurately handles normal maps. We basically get those features for free without any extra effort.

A demo in the Blightbound engine of SSR, with a scrolling normal map for waves.

At its core SSR is a pretty simple technique. However, it’s also full of weird limitations. The challange of implementing SSR comes from working around those. Also, there are some pretty cool ways to extend the possibilities of SSR, so let’s have a look at all the added SSR trickery I implemented for Blightbound.

First off there’s the issue of transparent objects. Since the world of Blightbound is covered in a corrupting mist (the “blight”), our artists put lots of foggy layers and particles into the world to suggest local fog. For the reflections to have the right colours it’s pretty important that these fog layers are included in the reflections. But there are also fog layers in front of the puddles, so we can't render the reflections last either.

The solution I chose is simple: reflections use the previous frame instead of the current frame. This way we always look up what's being reflected in a complete image, including all transparent objects. The downside of this is of course that our reflection isn't completely correct anymore: the camera might have moved since the previous frame, and characters might be in a different pose. However, in practice the difference is so small that this isn't actually noticeable while playing the game.

Transparent objects pose another problem for SSR: they're not in the depth buffer so we can't locate them correctly. Since Blightbound has a lot of 2D animation, lots of objects are partially transparent. However, many objects can use alpha test. For example, the pixels of a character's texture are either fully transparent of not transparent at all. By rendering such objects with alpha test, they can write to the depth buffer without problems.

This doesn't solve the problem for objects with true transparency, like smoke, fog and many other special effects. This is something that I don't think can be reasonably solved with SSR, and indeed we haven't in Blightbound. If you look closely, you can see that some objects aren't reflected at all because of this. However, in most cases this isn't noticeable because if there's an opaque object closely behind it, then we'll see the special effects through that. While quite nonsensical, in practice this works so well that it seems as if most explosions are actually reflected correctly.

Transparent objects like this fire aren't in the depth buffer and thus can't be found for reflections. However, if another object is close behind, like the character on the left, then the transparent object is reflected through that. The result is that it seems as if many special effects are properly reflected.

Having perfectly sharp reflections looks artificial and fake on many surfaces. Reflections in the real world are often blurry, even more so the further the reflected object is from the surface. To get reflections that accurately blur with distance I've applied a simple trick: the rays get a slight random offset applied to their direction. This way objects close remain sharp and objects get blurrier with distance. Artists can tweak the amount of blur per object.

However, this approach produces noise, since we only cast one ray per pixel. We could do more, but that would be really heavy on performance. Instead, to reduce the noise a bit, when far enough away I also sample from a low-resolution blurry version of the scene render-texture. This is a bit redundant but helps reduce the noise. Finally, by default in Blightbound we do supersampling anti-aliasing (SSAA) on the screen as a whole. This results in more than one ray being shot per screen-pixel. Only on older GPUs that can't handle SSAA is this turned off.

Another issue is precision. For really precise reflections, we would need to take a lot of samples along the ray. For performance reasons that's not doable though, so instead we make small jumps. This however produces a weird type of jaggy artefacts. This can be improved upon in many different ways. For example, if we would render the reflections at a lower resolution, we would be able to take a lot more samples per ray at the same performance cost. However, with how I implemented SSR into our rendering pipeline that would have been quite cumbersome, so I went for a different approach which works well for our specific situation:

More samples close to the ray's origin, so that close reflections are more precise.
Once the intersection has been found, I take a few more samples around it through binary search, to find the exact reflection point more precisely. (The image below is without binary search.)
The reflection fades into the fog with distance. This way the ray never needs to go further than a few meters. This fits the world of Blightbound, which is basically always foggy.

The result of these combined is that we get pretty precise reflections. This costs us 37 samples along a distance of at most 3.2 meters (so we never reflect anything that's further away than that from the reflecting surface).

Any remaining imperfects become entirely unnoticeable since blur and normal maps are often used in Blightbound, masking any artefacts even further.

A challenge when implementing SSR is what to do with a ray that passes behind an object. Since our basic hit-check is simply whether the previous sample was in front of an object and the next one is behind an object, a ray that should pass behind an object is instead detected as hitting that object. That's unintended and the result is that objects are smeared out beyond their edges, producing pretty ugly and weird artefacts. To solve this we ignore a hit if the ray dives too far behind an object in one step. This reduces the smearing considerably, but the ray might not hit anything else after, resulting in a ray that doesn't find an object to reflect. In Blightbound we can solve this quite elegantly by simply using the level's fog colour for such failed rays.

By default, SSR produces these kinds of vertical smears. Assuming objects have a limited thickness reduces this problem greatly, but adds other artefacts, like holes in reflections of tilted objects.

This brings us to an important issue in any SSR implementation: what to do with rays that don't hit anything? A ray might fly off the screen without having found an object, or it might even fly towards the camera. Since we can only reflect what we see, SSR can't reflect the backs of objects and definitely can't reflect anything behind the camera. A common solution is to have a static reflection cubemap as a fallback. This cubemap needs to be generated for each area so that the reflection makes sense somewhat, even if it isn't very precise. However, since the world of Blightbound is so foggy I didn't need to implement anything like that and can just fall back to the fog colour of the area, as set by an artist.

The final topic I would like to discuss regarding SSR is graphics quality settings. On PC users expect to be able to tweak graphics quality and since SSR eats quite a lot of performance, it makes sense to offer an option to turn it off. However, what to do with the reflecting objects when there's no SSR? I think the most elegant solution is to switch to static cubemaps when SSR is turned off. Static cubemaps cost very little performance and you at least get some kind of reflection, even if not an accurate and dynamic one.

However, due to a lack of time that's not what I did in Blightbound. It turned out that just leaving out the reflections altogether looks quite okay in all the spots where we use reflections. The puddles simply become dark and that's it.

For reference, here's the shader code of my SSR implementation. This code can't be copied into an engine directly, since it's dependent on the right render-textures and matrices and such from the engine. However, for reference when implementing your own version of SSR I expect this might be useful.

SSR is a fun technique to implement. The basics are pretty easy, and real-time reflections are a very cool effect to see. As I've shown in this post, the real trick in SSR is how to work around all the weird limitations inherent in this technique. I'm really happy with how slick the reflections ended up looking in Blightbound and I really enjoyed implementing SSR into our engine.

Sunday, 9 August 2020

Three nasty netcode bugs we fixed around Blightbound's launch

In the weeks before and after the Early Access launch of our new game Blightbound we've been fixing a lot of bugs, including 3 very stubborn netcode bugs. They weren't any fun while we were looking for them for days, but now that we've found and fixed these they're actually quite intriguing. So today I'd like to share 3 of the more interesting bugs I've seen in the past few years.

While these bugs mostly say something about issues in our code, I think they're also nice examples of how complex low level netcode is, and how easy it is to make a mistake that's overlooked for years, until a specific circumstance makes it come to the surface. All three of these bugs were also in the code of our previous game Awesomenauts, but none of them actually occurred there due to the different circumstances of an Awesomenauts match compared to a Blightbound party.

The one thing all three of these bugs have in common is that they were very hard to reproduce. While they happened often enough to be game-breaking, they rarely happened in testing. This made them very hard to find. For the first two we found a way to reproduce them after a lot of experimenting, while for the final one we never got it ourselves until we had figured out the solution and knew exactly how to trigger it. Finding that one without being able to see it was some solid detective work by my colleague Maarten.

Endless loading screens number 1

This bug occasionally caused clients to get stuck in the loading screen forever. After adding additional logging we saw that the client never received the HostReady message. This is a message that the host sends to the clients to let them know that loading is done and they should get into the game. The client was forever waiting for that HostReady message. Upon further investigation it turned out that the host did actually send the message, but the client received it and then threw it away as irrelevant. Why was that?

To answer that question, I need to give some background information. An issue with the internet is that it's highly unreliable. Packets can be lost altogether and they can also come in out-of-order or come in very late. One odd situation this can cause, is that a message from a previous level can come in once you're already in the next level, especially if loading times are very low. Applying a message from the previous level to the next level can cause all kinds of weird situations, so we need to recognise such an outdated message and throw it away. We do this with a simple level ID that tells us which level the message belongs to. If we get a message with the wrong level ID, we discard it.

That's simple enough, but we saw that both client and host were in the loading screen for the next level. Why then would the client discard the HostReady message as being from the wrong level? The reason turned out to be in our packet ordering system. Some messages we want to process in the order in which they are sent. So if one message goes missing, we store the ones after it and don't process them until the previous missing one is received and handled.

The bug turned out to be in that specific piece of code: the stored messages didn't store their level ID. Instead, when they were released, we used the level ID for the message that had just come in. This went wrong when a message from the previous level went missing and came in a second or two too late. By that time the HostReady message had already been received, but had been stored since it needed to be processed in order. Since the level ID from the previous room was used for all out-of-order messages, the HostReady message was processed as if it came from the previous room. Thus it was discarded.

I imagine reading this might be mightily confusing, so hopefully this scheme helps explain what went wrong:

Once we had figured this out, the fix was simple enough: store the level ID with each message that is stored for later processing, so that we know the correct level ID when we get to it.

Endless loading screens number 2

The second loading screen bug we had around that time had a similar cause, but in a different way. When a client is done loading, it tells the host so in a ClientReady message. The host needs to wait for that. Similar to the bug I described above, the ClientReady message was received but discarded because it was from the wrong level. The cause however was a different one.

Sometimes the client starts loading before the host, for whatever reason. Since loading times between levels are very quick in Blightbound, the client can be done loading before the host starts loading. Once the client is done loading, it sends the ClientReady message. However, if the host isn't loading yet and is still in the previous room, then it discards the ClientReady message because it's from the wrong room. A moment later the host starts loading and when it's done loading, it starts waiting for the ClientReady message. Which never comes because it was already received (and discarded).

Here too the solution was pretty straightforward. We first changed the level ID to be incremental (instead of random) so that we can see in the level ID whether an incorrect message is from the next or the previous room. If it's from the previous room, we still discard the message. But if the message is from the next room, we store the message until we have also progressed to that room, and then process it.

The big desync stink

The final netcode bug I'd like to discuss today is rather painful, because unlike the others it still existed at launch. In fact, we didn't even know about it until a day after release. Once we knew it existed it still took us days to find it. In the end this one was fixed in update 0.1.2, six days after launch.

So, what was the bug? What players reported to us was that randomly occasionally the connection would be lost. In itself this is something that can happen: the internet is a highly unreliable place. However, it happened way too often, but never when we tried to reproduce it, not even with simulated extremely bad connections. Also, we have code that recognises connection problems, but this bug didn't trigger that code, so the game thought it was still connected and kept playing, with very odd results.

After a few days a pattern started to emerge: many players reported that this always happened once they had notoriety 3 or 4, when a legendary item was attainable. We didn't think that the bug was actually linked to notoriety or legendary items, because these are quite unrelated to netcode. Instead, we suspected something else: maybe the bug happened after playing together for a certain amount of time.

With this hypothesis in hand we started looking for anything where time matters. Maybe a memory leak that grew over time? Or, more likely: maybe some counter that looped incorrectly? As I described in the above issues, netcode can have many types of counters. I already described the level ID counter and the counter for knowing whether a message is received in-order. There are others as well.

The thing with data that is sent over the netwerk is that it needs to be very efficient. We don't want to send any bytes more than necessary. So whenever we send a counter, we use as few bits as possible. Why send a 32 bit number when 16 bits will do? However, when using smaller numbers they will overflow more quickly. For example, a 16 bit number can store values up to 65,535. Once we go beyond that, it loops back to 0.

Often looping back to 0 isn't a problem: the last time we received the lower packets is so long ago that we won't have to expect any overlap with the last time we used number 0. However, we do need to handle the looping numbers correctly and not blindly throw away number 0 because it's lower than number 65,535.

And this indeed turned out to be the problem. One of the counters we used was a 16 bit number and after about one hour it looped back. In that particular spot we had forgotten to include code to handle that situation. The result is that from there on the game received all messages but discarded them without applying them. Like in the cases above, the solution was simple enough to implement once we knew what the problem was.

One question remains: why wasn't this a problem in Awesomenauts? For starters, Awesomenauts matches are shorter and the connection is reset after a match. So the counter never reaches 65,535. In Blightbound however the connection is kept up in between dungeon runs, so if players keep playing together the number will go out of bounds. But there's more: if the bug would have happened in Awesomenauts, the game would have thought it was a disconnect and would have reconnected, resetting the counter in the process. In Blightbound we determine connection problems in a different way, causing the connection to not be reset in this particular case.

A lesson relearned

Netcode being hard is not new to us and getting weird bugs is inherent to coding such complex systems. However, there is one step we did for Awesomenauts that we didn't do here: auto testing. Since Blightbound uses a lot of netcode from Awesomenauts and Awesomenauts has been tested endlessly, we thought we didn't need to do extensive auto testing for this same code in Blightbound. This was a mistake: the circumstances in Blightbound turn out to be different enough that they cause hidden bugs to surface and wreak havoc.

We're currently running autotests for Blightbound to further stabilise the game. This already resulted in another netcode fix in update 0.1.2 and several obscure crash fixes that will be in the next major update.

To conclude I'd like to share this older but still very cool video of auto testing in Awesomenauts. It shows four computers each randomly bashing buttons and joining and leaving matches. The crazy movement of the Nauts is fun to watch. For more details on how we implemented auto testing in Awesomenauts, have a look at this blogpost.

Sunday, 1 March 2020

How we made particles twice as fast through cache optimisation

Cache optimisation is one of the most hardcore, low-level topics in game programming. It's also a topic that I haven't been very successful with myself: I tried optimising for cache efficiency a few times but never managed to achieve a measurable improvement in those specific cases. That's why I was quite intrigued when our colleague Jeroen D. Stout (famous as the developer of the experimental indie game Dinner Date) managed to make our particles twice as fast. What's the trick? Jeroen was kind enough to tell me and today I'll share his approach. Since it's such a hardcore topic, I'll start by giving a general overview of what cache optimisation is in the first place.

Let's have a look at computer memory, or RAM. Computers these days have gigabytes of very fast memory that's used to store whatever the game or program needs to use on the fly. But is that memory truly so fast? Physically, it's a separate part of the computer. Whenever the processor needs to get something from memory, it needs to be transported to the CPU. That only takes a tiny bit of time, but a modern processor can perform billions of operations per second. At those kinds of speeds, a tiny bit of time is actually a lot of operations that the processor could have done during that time. Instead, it's waiting for something to come out of memory.

This is where cache comes in. Cache is a small amount of even faster memory that's directly on the processor and is thus a lot faster to access than normal memory. So whenever something is already in cache, the processor doesn't have to wait for it (or at least not as long) and can keep working. According to this article cache can be up to 27 times faster than main memory.

As programmers we generally don't have direct control over what's in cache: the processor tries to utilise the cache as efficiently as possible by it's own volition. However, depending on how we access memory, we can make it a lot easier for the processor to use cache efficiently. This is where cache optimisation comes in.

The goal of cache optimisation is to structure our code in such a way that we use cache as much as possible and need to wait for memory as little as possible. Whenever the processor needs to get something from memory and it's not already available in cache, that's called a cache miss. Our goal is to avoid cache misses as much as possible.

To be able to avoid cache misses, we need to know a little bit about how cache works. I'm no specialist in this field, but we only need a small bit of knowledge to already understand how a lot of cache optimisations work. These are the basic ideas:

Data is transferred from memory to cache in blocks. So whenever you're using something, the memory directly around that is likely also already available in cache.
Cache is often not cleared immediately. Things you've just used have a good chance of still being available in cache.
The processor tries to predict what you'll use next and get that into cache ahead of time. The more predictable your memory access, the better cache will work.

These three concepts lead to a general rule for writing code that's efficient for cache: memory coherence. If you write your code in such a way that it doesn't bounce all over memory all the time, then it will likely have better performance.

Now that the basics are clear, let's have a look at the specific optimisation that Jeroen did that halved the CPU usage of our particles.

To avoid dumping huge amounts of code here, I'm going to work with a highly simplified particle system. Please ignore any other inefficiencies: I'm focusing purely on the cache misses here. Let's have a look at a very naive approach to implementing a particle system:

struct Particle
{
   Vector3D position;
};

class ParticleSystem
{
   std::vector<Particle*> particles;

   void update()
   {
      for (int i = 0; i < particles.size(); ++i)
      {
         particles[i]->position += speed;
         if (shouldDestroy(particles[i]))
         {
            delete particles[i];
            particles.erase(particles.begin() + i);
            --i;
         }
      }
      while (shouldAddParticle())
      {
         particles.push_back(new Particle{});
      }
   }
};

This code is simple enough, but if we look at it from a cache perspective, it's highly inefficient. The problem is that every time we create a new particle with new, it's placed in a random free spot somewhere in memory. That means that all the particles will be spread out over memory. The line particles[i]->position += speed; is now very slow, because it will result in a cache miss most of the time. The time the processor spends waiting for the particle's position to be read from memory is much greater than the time that simple addition takes.

I knew that this would give performance problems, so I immediately built the particles in a more efficient way. If we know the maximum number of particles a specific particle system can contain, then we can reserve a block of memory for that on startup and use that instead of calling new all the time.

This results in a bit more complex code, since we now need to manage that block of memory and create objects inside it using placement new. In C++, placement new allows us to call a constructor the normal way, but use a block of memory that we already have. This is what the resulting code can look like:

struct Particle
{
   Vector3D position;
};

class ParticleSystem
{
   unsigned char* particleMemory;
   std::vector<Particle*> particles;
   int numUsedParticles;

   ParticleSystem():
      numUsedParticles(0)
   {
      particleMemory = new unsigned char[maxCount * sizeof(Particle)];
      for (unsigned int i = 0; i < maxCount; ++i)
      {
         particles.push_back(reinterpret_cast<Particle*>(
            particleMemory + i * sizeof(Particle)));
      }
   }

   ~ParticleSystem()
   {
      delete [] particleMemory;
   }

   void update()
   {
      for (int i = 0; i < numUsedParticles; ++i)
      {
         particles[i]->position += speed;
         if (shouldDestroy(particles[i]))
         {
            //Call the destructor without releasing memory
            particles[i]->~Particle();

            // Swap to place the dead particle in the last position
            if (i < numUsedParticles - 1)
               swap(particles[i], particles[numUsedParticles - 1]);

            --numUsedParticles;
            --i;
         }
      }
      while (shouldAddParticle() && bufferNotFull())
      {
         //Placement new: constructor without requesting memory
         new(particles[numUsedParticles]) Particle{};
         ++numUsedParticles;
      }
   }
};

Now all the particles are close to each other in memory and there should be much fewer cache misses when iterating over them. Since I built it this way right away, I'm not sure how much performance I actually won, but I assumed it would be pretty efficient this way.

Nope.

Still lots of cache misses.

In comes Jeroen D. Stout, cache optimiser extraordinaire. He wasn't impressed by my attempts at cache optimisation, since he saw in the profiler that the line particles[i]->position += speed; was still taking a disproportionate amount of time, indicating cache misses there.

It took me a while to realise why this is, but the problem is in the swap-line. Whenever a particle is removed, it's swapped with the last one to avoid moving all particles after it one forward. The result however is that as particles are removed, the order in which we go through the particles becomes very random very quickly.

The particles in our example here are really small: just a single Vector3D, so 12 bytes per particle. This means that even if we go through the particles in random order, we might occasionally still be staying in the same cache line. But a real particle in the Secret New Game (tm) that we're developing at Ronimo has much more data, like speed, scale, orientation, rotation speed, vibration, colour, and more. A real particle in our engine is around 130 bytes. Now imagine a particle system with 100 particles in it. That's a block of memory of 13,000 bytes. Hopping through that in random order is much more problematic for cache!

Thus, Jeroen set out to make it so that updating particles goes through memory linearly, not jumping around in memory at all anymore.

Jeroen's idea was to not have any pointers to the particles anymore. Instead we have that big block of memory with all the particles in it, and we store the indices of the first and last currently active particles in it. If a particle dies, we update the range of living particles and mark the particle as dead. If the particle happens to be somewhere in the middle of the list of living particles then we simply leave it there and skip it while updating.

Whenever we create a new particle, we just use the next bit of memory and increment the index of the last living particle. A tricky part here is what to do if we've reached the end of the buffer. For this we consider the memory to be a ring buffer: once we reach the end, we continue at the start, where there's room since those particles will have died by now. (Or, if there's no room there, then we don't create a particle since we don't have room for it.)

The code for this isn't that complicated, but it's a bit too much to post here. Instead, I'm just going to post this scheme:

Compared to my previous version, Jeroen managed to make our particles twice as fast with this approach. That's a pretty spectacular improvement and shows just how important cache optimisation can be to performance!

Interestingly, my first hunch here is that this optimisation is a bad idea since we need to touch all the dead particles in between the living ones to know that they're dead. Worst case, that's a lot of dead particles. In fact, when looking at computational complexity, the worst case has gone from being linear in the number of currently existing particles, to being linear in the number of particles that has ever existed. When I studied computer science at university there was a lot of emphasis on Big O Notation and complexity analysis, so to me this is a big deal.

However, as was mentioned above, accessing cache can be as much as 27 times faster than accessing main memory. Depending on how many dead particles are in the middle, touching all those dead particles can take a lot less time than getting all those cache misses.

In practice, the lifetimes of particles don't vary all that much. For example, realistic particles might randomly live between 1.2 seconds and 1.6 seconds. That's a relatively small variation. In other words: if there are dead particles in between, then the ones before them will also die soon. Thus the number of dead particles that we need to process doesn't become all that big.

The big lesson this has taught me personally, is that cache optimisation can be more important for performance than Big O Notation, despite that my teachers at university talked about Big O all the time and rarely about cache optimisation.

EDIT: Several commenters on Twitter and Reddit suggested an even simpler and faster solution: we can swap the dead particles towards the end of the list. This removes the overhead of needing to skip the dead particles and also removes the memory that's temporarily lost because dead particles are in the middle. It also simplifies the code as we don't need to handle that ring buffer anymore.

For further reading, I would like to again recommend this article, as it explains the principles behind cache lines very well and shows some clever tricks for optimising to reduce cache misses.

As a final note, I would like to also stress the unimportance of cache optimisations. Computers these days are super fast, so for the vast majority of code, it's fine if it has very poor cache behaviour. In fact, if you're making a small indie game, then it's probably fine to ignore cache misses altogether: the computer is likely fast enough to run your game anyway, or with only simpler optimisations. As your game becomes bigger and more complex, optimisations become more important. Cache optimisations generally make code more complex and less readable, which is something we should avoid. So, use cache optimisations only where they're actually needed.

Have you ever successfully pulled off any cache optimisations? Are there any clever tricks that you used? Please share in the comments!

Sunday, 3 March 2019

The psychology of matchmaking

Matchmaking is a touchy subject and this has previously made me somewhat hesitant to write about it in an open and frank manner. Today's topic especially so, since some players might interpret this post as one big excuse for any faults in the Awesomenauts matchmaking. However, the psychology of matchmaking is a very important topic when designing a matchmaking system, so today I'm going to discuss it anyway. For science! :)

While we were designing the Galactron matchmaking systems I did quite a lot of research into how the biggest multiplayer games approach their matchmaking. The devs themselves often don't say all that much about it, but there's plenty of comments and analysis by the players of those games. The one thing they all have in common, is that whatever game you look for, you'll always find tons of complaints from users claiming the matchmaking for that particular game sucks.

My impression is that no matter how big the budget and how clever the programmers, a significant part of the community will always think they did a bad job regarding matchmaking. Partially this might be because many games indeed have bad or mediocre matchmaking, but there's also a psychological factor: I think even a theoretical 'perfect' implementation will meet a lot of negativity from the community. Today I'd like to explore some of the causes for that.

The first and most obvious reason is that matchmaking is often a scapegoat. Lost of match? Must be because of the crappy matchmaking. My teammates suck? Must be the crappy matchmaking. Got disconnected? Definitely not a problem in my own internet connection, must be the crappy matchmaking. Undoubtedly in many cases the matchmaking is indeed part of the problem, but these issues will exist even with 'perfect' matchmaking. Sometimes you're just not playing well. Sometimes a teammate has a bad day and plays much worse than they normally do. Sometimes your own internet connection dropped out. No matchmaker can solve these issues.

There's a strong psychological factor at play here: for many people their human nature is to look for causes outside themselves. I think this is a coping mechanism: if you're not to blame, then you don't have to feel bad about yourself either.

Of course there is such a thing as better or worse matchmaking. That players use matchmaking as a scapegoat shouldn't be used as an excuse to not try to make better matchmaking. But it sure makes it difficult to asses the quality of your matchmaking systems. Whether your matchmaker is doing well or not, there will practically always be a lot of complaints. The more players you have, the more complaints.

For this reason it's critical to collect a lot of metrics about how your matchmaking is objectively doing. We gather overall metrics, like average ping and match duration and such, but we also store information about individual matches. This way when a player complains we can look up their match and check what happened exactly. This allows us to analyse whether specific complaints are caused by problems in the matchmaker, are something that the matchmaker can't fix (like a beginner and a pro being in a premade together) or whether the user is using matchmaking as a scapegoat for something else.

Another problem for matchmaking is that player's don't have a single, predictable skill level. The matchmaker matches a player based on their average skill, but how well they actually play varies from match to match. One match they might do really well, and then the next they might do badly. For example, maybe the player gets overconfident and makes bad decisions in the next match because of that. Or maybe the player is just out of luck and misses a couple of shots by a hair that they would normally hit. Or maybe the player got home drunk from a party and decided to play a match in the middle of the night, playing far below their normal skill level. These are things that a matchmaker can't predict. This will often make it seem like the matchmaker didn't match players of similar skill. Sometimes this might result in a teammate who would normally be as good as you but happens to play like a bag of potatoes during this one match in which they're in your team.

While this problem is not truly solvable by matchmaking, there are some things developers can do to improve on it. For example, in Heroes Of The Storm you select your hero before you go into matchmaking. This allows the matchmaker to take into account that you might be better at some heroes than at others. If it detects that you're playing a hero that you haven't played in a long while, then maybe it should matchmake you below your skill level. I have no idea whether Heroes Of The Storm actually does this, but it's certainly a possibility. This would allow detecting some of the cases in which a player is normally really good, but is currently trying something new that they haven't practised yet.

However, this particular trick comes at a heavy cost, which is why we decided not to put it into Awesomenauts: if players select their hero before matchmaking happens then matchmaking is severely limited in who can play with whom, which damages other matchmaking criteria. (I've previously discussed this in my blogpost about why you need huge player numbers for good matchmaking.)

A very different kind of psychological aspect is that players are often bad at estimating their own skill level. The following example of this is something that I have no doubt will be recognised by many people who play multiplayer games. A while ago I played a match in which a teammate was constantly complaining about how badly I was playing and how I was causing us to lose the match. However, looking at the scoreboard I could see that he was constantly dying and was by far doing worst of all players in the match. Apparently that player didn't realise that he was much less good at the game than he thought he was.

There's more to this than anecdotal evidence however. The developers of League of Legends have described that on average, players rate their own skill 150 points higher than their real MatchMaking Rating. (That part of the post has been removed in the meanwhile, but you can still find it here on Wayback Machine.) As they also mention there, psychology actually has a term for this: it's called the Dunning-Kruger effect. A League of Legends player analysed a poll about this here and gives an excellent explanation of how it works:

"According to the Dunning-Kruger-Effect people overestimate themselves more the more unskilled they are. This isn’t caused by arrogance or stupidity, but by the fact that the ability to judge a certain skill and actually being good at that skill require the same knowledge. For example if I have never heard of wave management in LoL I am unable to notice that I lack this skill, because how would I notice something if I don’t even know it exists? I would also not notice this skill in other people, which is why I would overestimate my own skill if I compared myself to others. This it what causes the Dunning-Kruger-Effect." - Humpelstilzche

As some readers suggested, all of these psychological factors invite an interesting thought: would it help to give players more control? After all, if you have no control you blame the system, while if you do have more choice, you also have yourself to blame and maybe accept the results more. Giving players choice over matchmaking is in many ways old-fashioned and reminds me of the early days of online multiplayer, where you selected your match yourself from a lobby browser. The common view these days seems to be that players expect a smooth and automated experience. They don't want to be bothered and just want to click PLAY and get a fun match. Or at least, most devs seem to assume that that's what players want.

For Awesomenauts I've actually been curious for a while what would happen if we removed the automated matchmaking and instead relied entirely on opening and selecting lobbies. The modding scene would surely thrive a lot more that way, but how would it affect player experience and player counts? It would be a risky change and also too big a change to just try though so I doubt we'll ever get to know the answer to that question. Also, I'm not sure whether our current lobby browser provides a smooth enough experience for making it that important.

I do think it's interesting to explore this further though. I wonder what would be the result if matchmaking were from the beginning designed around being a combination of automation, communication and player control.

Before ending this blogpost I should mention one more psychological aspect of matchmaking: the developer's side. As a developer it can be really frustrating to get negative comments on something you've spent a lot of time on. Understanding why can help in coping with this frustration. Just like matchmaking can be a scapegoat for players after losing a match, the psychology of matchmaking can be a scapegoat for developers after getting negative feedback from players.

None of the psychological factors discussed in this post are an excuse to just claim that the matchmaking in a game is good despite players complaining. However, for a developer it's really important to realise that these factors do exist. Understanding the psychology of matchmaking allows you to build better matchmaking and helps to interpret player comments. I have no doubt that I've only scratched the surface of this topic, so I'm quite curious: what other psychological aspects influence how matchmaking is experienced by players?