Sunday, 14 September 2014

Core network structures for games

When starting to develop an online multiplayer game you need to choose how to structure the netcode. Especially important is the question which computer decides on what part of the gameplay. There are roughly four models in common use in games these days. Today I would like to explain which those are and what their benefits and downsides are.

Here are those four basic structures (of course all kinds of hybrids and variants are possible):



Client-server

In the two versions of client-server there is one computer who is alone responsible for the entire game simulation: the server. The clients cannot make real gameplay decisions. This means that if a player presses a button, it goes to the server, the server executes it and then sends back the results to the client.

This adds significant lag to all input, which is of course totally unacceptable and kills the gameplay feel. To make a game playable with this model all kinds of tricks are needed. The best trick I am aware of is described in this must-read article by Valve. The basic idea is this:

  • When the player presses a button, the client immediately processes it as if it has the authority to do so, starting animations and such. A message is also sent to the server.
  • The server receives the button press a little bit later, so the server rewinds to the time of the button press, executes it, and then re-simulates to the current time.
  • The server then sends the current state to the client
  • The client receives the latest state, but in the meanwhile more time has passed. So the client rewinds to the time at which the server sent the message, corrects its own state with what the authoritative server had decided, and then re-simulates locally to the current time.

In other words: both the client and the server rewind and then re-simulate whenever a packet is received. Implementing rewinding mechanisms is a complex task and very difficult to add to an existing game. As far as I know this is nevertheless the best and most used approach.

The difference between the two client-server architectures is who the server is. Either it is one of the players, or it is a computer that the game's developer/publisher manages. A dedicated server is usually better, but much more complex and expensive as the developer needs to manage a scalable amount of servers. The fiascos at the launches of Diablo III and Sim City showed how difficult this is to do. The more successful the game, the more difficult dedicated servers are to pull off. They are also simply expensive.

Peer to peer

The third architecture is pure peer to peer. Here no single computer is responsible for the entire game simulation. Instead the simulation is spread out over all of the players. The challenge then is how to divide responsibilities over the players. Awesomenauts uses this model and our distribution of the simulation is simple: each player simulates his own characters and bullets. This has a big benefit: player input can always be handled immediately. No rewinding structure are needed and there is never any input lag for the player. This also makes it much easier to add to an existing game.



Peer to peer has some heavy drawbacks though. The biggest one is that lag becomes much more unpredictable. While in a client server architecture only the lagging player suffers from his own lag, in a peer to peer game the other players will also notice if one player has a bad internet connection.

Peer to peer usually introduces complex synchronisation situations when the simulations of two players are not compatible. A good example of this can be found in my previous blogpost on Awesomenauts' infamous sliding bug. Care needs to be taken to recognise and handle such situations. In most game concepts few of these problems will pop up though: in Awesomenauts pushing other players is the only really complex part regarding conflicting synchronisation.

Another major downside of peer to peer is in the amount of network traffic needed. Since all players need to talk to all other players it requires many more network packets. In client-server only the server needs to talk to everyone, so only one player is affected instead of all of them. Even better for packet count is using dedicated servers: the entire burden falls on servers that the game developer provides.

Deterministic peer to peer lockstep

The fourth and final basic structure is deterministic peer to peer lockstep. This model is mostly used for RTS games. This is also a peer to peer model but here we don't need to worry about which player manages which objects. Instead every client simulates everything in the exact same way. The only thing that needs to be sent over the network is each player's actions. The game runs as lots of really short turns: every step the game collects the commands from all players over the network and then simulates the next step. This is not limited to turn-based games: by doing lots of really short steps it can feel like a real-time game.

Deterministic peer to peer has the enormous benefit that you hardly need to send anything. Only player actions need to be sent. If everyone starts the game in the same situation and runs the exact same steps, then the game will remain in synch without ever sending updates over the network. Therefore this model is highly suitable for RTS games, since they have so many units that synchronising everything is often infeasible. An old but still great article on implementing full determinism is this one: 1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond.

A downside to this model is that it usually adds quite a lot of lag to controls, since actions cannot be executed until all players know about them. Such input lag can be hidden by playing sounds and visual effects immediately when the user clicks. This way the player won't notice that his units don't react immediately.

Note that deterministic lockstep can also be combined with a client/server connection model where the data always flows through the server instead of directly between all players.



Implementing full determinism is incredibly difficult. If any differences exist between the simulations on the clients, then these differences will grow over time and result in the desynchronisation of the game. Lots of tricks need to be used to achieve determinism. For example, floats cannot be used because of rounding errors: all logic needs to be build on integers. Random number generators can only be used if their seeds are synched and they are used in the exact same way. This might for example go wrong if one player runs on a higher graphics quality and thus has extra particles on his screen. Those particles might also use the random number generator and thus desynch it. A simple solution is to use a separate random generator for non-gameplay objects, but this is easy to forget, breaking the entire game.

Getting determinism right is such a challenge that many games that use it add a mechanism to check the correctness of the simulation. They regularly send a checksum of the entire gamestate over the network. Checksums are small so this uses hardly any bandwidth. If the checksums are not the same then the game has desynched. To fix a desynch we could pause the game, send the entire simulation over the network and then continue from there. In older games you might recognise this problem when you got kicked out of a game because of a "synchronisation error".

There are of course many more subtleties to network architecture than I have explained here. All kinds of hybrids are possible and there are many details that I have not mentioned, like vulnerability to cheaters and host migration. I cannot discuss them all today, but I hope this blogpost has given a good summary of the basics. One important topic that really needs to be explained in combination with the above information is relay servers so I will cover that next week.