Joost's Dev Blog: replays/spectator

Saturday, 4 January 2014

Bitcrunching numbers for bandwidth optimisation

An important part of making a complex multiplayer game like Awesomenauts is optimising bandwidth usage. Without optimisations, just sending position updates for one character to one other player already costs 240 bytes per second (30 updates per second at 4 bytes each for both X and Y). And this is just a fraction of what we need to send: Awesomenauts can have up to 35 characters running around the game world, and we need to synchronise much more than just their positions. As you can see, bandwidth usage quickly spirals out of control, so we need to be smarter than just naively sending all the data all the time.

In the many months we spent on bandwidth optimisation, we employed lots of different techniques. The one I would like to talk about today can massively decrease bandwidth by crunching numbers on the level of individual bits. This technique not only worked great for the multiplayer implementation of Awesomenauts, but I am now also using this extensively for reducing the size of our replays.

Let's have a look at the core idea here. Storing one unsigned int normally costs 32 bits. In this we can store a value in the range of 0 to over 4 billion. However, we rarely need a range as big as that. For example, the amount of Solar you collect during an Awesomenauts match rarely goes over 5,000. In a really long match we might go past that, so let's bet on the safe side and say we want to store Solar values up to 10,000. This could be stored in 14 bits, since 2^14 = 16,384. That is less than half of the normal 32 bits used for a value like this!

Using too many bits is not a problem in memory since modern computers have plenty of that. But when sending the Solar over the network, we would like to use only those 14 bits we actually need. However, normally we can only handle data on a byte level, not on a bit level, so the best we can easily do is to store this in two bytes (16 bits).

What we need here to solve this is a bitstream, something that we can use to group together a lot of values on a bit-level. This way we can just grab a block of memory and push in all the values we want to send over the network. The idea is quite straightforward, so let's jump right in and have a look at how one could use that:

//Class for pushing bits into a block of memory
class BitInStream
{
public:
    BitInStream(char* data);
    void addUnsignedInt(unsigned int value, unsigned int numBits);
    void addBool(bool value);
};

//Class for reading bits from a block of memory
class BitOutStream
{
public:
    BitOutStream(const char* data);
    unsigned int getUnsignedInt(unsigned int numBits) const;
    bool getBool() const;
};

//Example of using these
void main()
{
    //The bits are stored in this
    char data[4];
 
    //Example of writing some values
    BitInStream bitInStream(data);
    bitInStream.addUnsignedInt(solar, 14);
    bitInStream.addUnsignedInt(itemsBought, 6);
    bitInStream.addBool(isAlive);
    bitInStream.addUnsignedInt(damageDone, 10);
 
    //Read back values
    BitOutStream bitOutStream(data);
    solar       = bitOutStream.getUnsignedInt(14);
    itemsBought = bitOutStream.getUnsignedInt(6);
    isAlive     = bitOutStream.getBool();
    damageDone  = bitOutStream.getUnsignedInt(10);
}

The details of actually implementing the bitstreams itself is too much for this blogpost, so I will leave that as a fun exercise for the reader. Hint: you can use bitwise operators like |, ‹‹, &.

When looking at this code there are a number of observations to make. First is that I am actually using only 31 bits while storing them in 4 bytes. In other words: I am wasting 1 bit here! This is something that almost always happens when using bits: at some point they need to turn into bytes, since that is what network packets are measured in, and we usually waste a couple of bits there. This loss can be decreased by using larger bitstreams with many more values in them.

A much more important realisation here is how easy it is to break this code. If I read back the solar with the wrong number of bits, then not only will the solar be complete garbage, but also all values after it. The same happens if I accidentally read back values in the wrong order, or forget to read one. The compiler will not warn you of any of these things and debugging on a bit-level is difficult, so this produces extremely stubborn bugs! Another easy mistake is to use too few bytes for the data, resulting in buffer overflows and thus memory corruption, or to use too many bytes, resulting in wasting bandwidth. In short: use bitstreams only when you really need them and be extremely careful around them!

So far we have only stored the easy kinds of data: bools and unsigned ints translate very easily to bits. However, many things in a game are floats. When using an unsigned int, taking just its first 10 bits results in a valid number, just with a smaller range. But with a float taking those same 10 bits results in nonsense. How can we send floats with arbitrary numbers of bits?

The trick here is rather simple. A float has a number of properties that we usually don't need. Floats have very high precision near 0, but when storing a position, we want the same precision everywhere. Floats also have an extremely large range, but for that same position we already know the size of the world and we don't need to be able to store values outside that range.

Knowing these things we can store positions (and most other floats) in a different way. Levels in Awesomenauts happen to all be in the range -10.0 to 30.0. Knowing this we can just store the x-position as an unsigned int with an arbitrary number of bits. If we want to store a position in 8 bits, then the lowest value (0) means position -10.0, and the highest value (255) means position 30.0. Values in between are just interpolated. This is a simple and very effective trick. Depending on how much precision we need, we can use more or less bits.

When synching over the network we actually don't need all that much precision, since the receiver is not using the position for a physics simulation. He mostly just uses it for visualisation and checking whether a bullet hits. So in Awesomenauts we use 13 bits for synching the x-position and another 13 bits for the y-position, which is a lot less than the normal 32 bits each would need as a float!

The same idea can be applied to many other things that are also floats and for which we know the range. Some things even need extremely little precision. For example, the health of other players is only visualised in their healthbar, so we only need to synch as many bits of that as are needed to make the healthbar look correct. Since healthbars are usually in the order of 100 pixels wide, we can store health in 7 bits, even if the actual range of the health might be 0.0 to 2000.0. With 7 bits our for that range would be only 15.625, but more isn't needed because the difference is not visible in the healthbar anyway. The simulation is not influenced at all by this, since the owner of the character still stores the full precision float for the health.

Using tricks like these we managed to get the bandwidth usage of Awesomenauts down to around 16 kilobyte per second. This is the total upload needed at the player who manages bots and turrets. The five other players in a match don't manage those bots and turrets, so they use far less bandwidth still. For comparison: our very first network implementation in the Awesomenauts prototype we had four years ago used around 50x (!) more bandwidth. Bit-crunching numbers as explained in this blogpost is just one of the many tricks we used to decrease bandwidth, but it was definitely one of the strongest!

Friday, 29 November 2013

First steps towards massive replays and spectators in Awesomenauts

One of the biggest challenges for us in the coming period is implementing replays and spectator mode in Awesomenauts. It might not be immediately obvious, but this is actually an incredibly complex problem to solve. We have the very first bits working now, so here is a video in which I show and explain what we have so far:

Our first target with replays is to get them working offline, so you store replays of your own matches on your own harddisk and you can email those files to your friends by hand. Storing and browsing replays through online servers comes after that and is still a long way away. We hope that the offline replays can go in beta before Christmas, but right now it seems like that might still be too early for how complex this all is.

So why are replays so difficult to build? There are a ton of reasons for this. One is that players need to be able to do live spectating. In high-level tournament matches, we currently sometimes get near to 1000 simultaneous viewers for a single match. This is on Twitch, so the bandwidth is handled by Twitch, but for our spectator mode we will need to make that possible ourselves. Handling that kind of user numbers and internet traffic is by itself already a big challenge.

Another problem is that players need to somehow get their replays to a server, live while playing. By itself this is not very difficult, but to keep this from disrupting normal online play, we need to do this in an extremely small amount of bandwidth. We are targeting to stream out a replay from one player to the server in only 1 to 2 kilobyte per second. The total size of a 20 minute match will then be around 2mb. Streaming all data in so few bits is an interesting challenge, but I think it is quite achievable.

Another problem is version management. We release lots of patches for Awesomenauts, and most patches change significant things in our gameplay logic. That means that if we rely on that gameplay logic for playing back replays, then stored replays will often not be playable anymore once we release a patch. A solution for this would be to add version management to the gameplay code, but that would add a kind a code complexity that we definitely don't want there. And I haven't even mentioned how sensitive to bugs that would be... Another solution would be to store legacy versions of the game to play older replays, but that would produce very high loading times when switching between reviews, and would mean loads of downloading to get all those older game versions.

Our solution for this version problem is to store only graphical data in the replay. This way if gameplay logic changes, it doesn't matter for stored replays, since those don't use the gameplay logic anyway. The demo shown above already works like this. However, this does mean that we cannot use any of the current gameplay code or network packages for the replays. We need to rebuild it all from the ground up in a different way. That is why the demo above still lacks animations and bullets and pickups and such: they have not been rebuilt for replays yet.

Another issue in the replays implementation is server bandwidth and loading times. If players view lots of replays, this will cost us a lot of bandwidth and thus a lot of money. Also, if players need to download the entire replay before they can watch it, then waiting times might be long. We even want players to be able to immediately skip to specific moments in the replay without downloading everything in between. With this in mind, the replay system again becomes a lot more complex.

There are several other challenges in implementing replays and spectator mode, but these are the most important. The video shown actually solves the basics of several of those already: there is quite a bit of interesting code architecture behind it. This is also why it took so long before we got anything to show at all: storing and playing character positions is by itself not all that difficult, but doing it in a way that will work with the above issues makes it a lot more time-consuming to implement. I expect I will be able to do a great talk or series of blogposts on this once the whole replays system has been built...

As a conclusion to this blogpost, I would like to say that this is a mightily interesting and fun challenge! We at the Ronimo coding team are greatly enjoying this task: coding puzzles are the best puzzles!