Joost's Dev Blog: May 2013

Wednesday, 29 May 2013

Awesomenauts in the Humble Bundle!

Yesterday the Humble Indie Bundle 8 launched, including Awesomenauts! For $1 or more, you get Steam keys to the games in the bundle, and Awesomenauts even comes with an exclusive chicken skin for Clunk: Cluck.

Really happy with the line-up this time. The other games in the bundle include some of my personal favourites from the past years: Little Inferno, Capsized and Dear Esther. Especially Dear Esther absolutely blew my mind when I played it, showing me a completely different kind of 'game' experience, and a story that really intrigued me deeply.

Sunday, 19 May 2013

Detecting notes from a live cello: the core technology of Cello Fortress

My new game Cello Fortress is controlled by a cello. This is a really weird and unique thing, and comes with some serious challenges. So far I have discussed the game design aspect of this, but at the very core of the game lies a much more technical topic: detecting notes in real-time from a live cello. Cello Fortress really knows what notes I am playing. I developed my own algorithm for that, and although it is not perfect (it quite often shortly detects notes that are not actually played), it works surprisingly well for such a difficult technical problem. So how does it work? Let's have a look!

The big challenge here is that a cello produces a very complex sound pattern. There are all kinds of overtones, scratches and noises in it, and detecting the actual note from that is incredibly difficult. I did some research before I began programming the game, and it turns out that finding notes in a live acoustic instrument is in fact an unsolved problem. There is quite a lot of research in this field, but a perfect solution has not been found yet.

Of course, for a digital instrument like a keyboard this is easy, but as soon as it gets acoustic, it becomes problematic. For guitar there are some devices on the market that can find notes in sound: they can convert live guitar playing to MIDI with very little delay. However, they are supposedly still far from perfect, and don't work with cellos. A cello produces a much more complex tone than a guitar, since it lacks a clear strumming moment: notes just continue and smoothly flow into each other. People also often mention Melodyne to me, but as far as I know, Melodyne can only detect notes after recording, not in real-time.

Seeing that no easy existing solution was around, I decided to start experimenting with this problem myself. I'll have to admit that I didn't read all of the scientific literature on the subject, but it turned out I didn't have to: I managed to come up with an algorithm that detects notes well enough for what I need for Cello Fortress.

So, my basic input here is just a microphone signal. My first step is to grab the last 0.2 seconds of microphone input (9600 samples) and take the Fourier Transform of that. The Fourier Transform is a rather standard mathematical operation that results in a frequency spectrum: a list of how strong each frequencu is within the signal. Since a cello has a very rich sound, a lot of frequencies are in a single note, but the Fourier Transform is still a great starting point for my algorithm.

The math behind Fourier Transforms can look pretty creepy and complex, but the nice thing is that they are usually calculated using the standard Fast Fourier Transform algorithm, which can simply be grabbed from the internet. It is also included in many audio libraries, for example in the awesome FMOD. So to use the Fourier Transform, there is no need to understand the details of how it is calculated.

The spectrum below is from playing an G2 on my cello. Remarkably, there is a clear pattern in the spectrum for this note: there are peaks at evenly spaced frequencies. These are called overtones. I knew about those before I started working on this algorithm, but I had never realized they look so simple. I know that the G I played here is 98hz (that is the open G-string on a cello). An indeed, the left-most peak is at exactly 98hz. Other peaks are at multiples of that: at 196hz, 294hz, 392hz, etc.

This spectrum looks so simple that it seems very easy to write an algorithm that detects notes. However, that turns out to be quite a bit more complex than this. For starters, I would really like to be able to detect chords, because I want to make gameplay with that. Another reason to want to detect chords, is because when I play from one note to the next, the previous note will still be audible due to reverb within the big body of the cello, and due to the previous string still vibrating. Being able to detect several notes at once will greatly help me handle the transition from one note to another. However, chords make the spectrum a whole lot more complex, as you can see in this example:

Regardless of how complex this looks, looking at this, a relatively simple algorithm comes to mind that is indeed the core idea of what I ended up with: let's list all the peaks, and then look for the peaks that would explain all other peaks through their overtones. For example, let's say I have peaks at 45hz, 90hz, 100hz, 135hz, 180hz and 200hz. Those are all multiples of 45hz and 100hz, so those must be the frequencies that are being played. In note names, those would be F#1 and G2.

This simple algorithm quickly turns into a big mess, though. It is difficult to make it handle peaks due to scratches and noise well, so I quickly ended up in a swamp of extra rules and magic numbers to compensate for those.

There is also a more fundamental problem: the spectrum is not very precise. For every block of about 3hz, the Fourier Transform tells me how strong it is. So I know the strength of 100hz, of 103hz, of 106hz, etc. Since a block is 3hz wide, the actual frequency for the bin at 103hz can be anything between 101.5hz and 104.5hz. This lack of precision is not a big problem for the higher notes: the difference between A3 (the open A-string) and the next note (A#3) is 13hz, which is well above 3hz. For low notes this is a bigger problem: the difference between C2 (the lowest note on a cello) and C#2 is only 3.9hz. This is so close to our 3hz precision, that this is bound to give problems.

One solution to this would be to somehow increase the precision of the Fourier Transform. The only way to do this, is to feed it more samples. I currently feed it 0.2 seconds (9600 samples), so I could double the precision if I would feed it 0.4 seconds (19200 samples). However, this has two big downsides. The first is that this increases the delay between playing a note and detecting it. The second downside, and this is much worse, is that many notes are way shorter than that. If I play really fast, I can play 6 notes per second, which means that a period of 0.4s contains several notes. This completely clutters the spectrum and makes it much more difficult to detect individual notes. For these reasons I really don't want to increase the precision of the Fourier Transform by taking a longer sample period.

This lack of precision makes the previous idea of finding the peaks that explain all other peaks not work well. In the above example, I assumed a peak at 45hz. However, because of the size of the bin, the real frequency is anywhere between 43.5hz and 46,5hz. This makes a huge difference for the frequency of the higher overtones. The 10th overtone of 43.5hz is at 435hz, which is five bins away from 450hz (the 10th overtone of 45hz). I tried to modify my peaks for this, but it quickly got stuck in more and more exception-cases.

The solution I came up with is a different approach based on the same idea. I step over all frequencies with steps of 0.5hz. For each frequency, I look up its own strength, and the strengths of the first 20 overtones. So for 43.5hz, I look up the strength at 43.5hz, and the strengths of the overtones at 87hz, 130.5hz, 174hz, etc.

Taking such small steps means I look in the same bin several times, since steps are only 0.5hz and bins are 3hz wide. But in the 20th overtone, those steps of 0.5hz correspond to 10hz, which is a difference of several bins. So doing more steps is actually relevant.

The final part of my algorithm is simple: a frequency must be a note if its own strength is strong enough, and if the added strengths of all its overtones is strong enough. These two minimum strengths are settings that depend mostly on the microphone's output volume. Through experimentation I have found good minimum strengths for both the own frequency and the overtones. The algorithm now looks roughly like this:

for (float frequency = 60; frequency ‹ 450; frequency += 0.5f)
{
    int baseBin = getBinFromFrequency(frequency);
    float baseStrength = spectrum[baseBin];

    float overtonesStrength = 0;
    for (int multiplier = 2; multiplier ‹= 20; multiplier++)
    {
        int bin = getBinFromFrequency(frequency * multiplier);
        overtonesStrength += spectrum[bin];
    }

    if (baseStrength >= baseStrengthMinimum
        && overtonesStrength >= overtonesMinimum)
    {
        print("Found a note! Frequency:", frequency);
    }
}

Note that this algorithm is bound to find the same tone at several frequencies: if 70hz is well above the thresholds, than 70.5hz and 69.5hz probably also are, even though they are the same note. So when several frequencies are close to each other, I simply take the strongest and throw the others away.

Despite that my bins are 3hz wide, the extra information from the overtones makes it so precise that I can use this to reliably tune the strings of my cello very precisely. Beforehand, I had never expected I would find an algorithm so simple, and yet giving such exact answers!

Does this mean I am done, that I found the golden solution? No, far from it... The result of this algorithm is indeed that with the right sensitivity settings, I can detect practically all notes I play on my cello. However, what we have now is still completely unusable, because it also detects tons of false-positives: notes that are not really being played. Next week, I will explain why this happens, and what tricks I added to solve most of this problem. See you then!

Monday, 6 May 2013

How the cello controls the game in Cello Fortress

The most unique aspect of Cello Fortress is how a cellist does a live performance in front of an audience, while at the same time controlling a game. This is completely different from other music games, in which the musician usually plays on a fake plastic instrument, and even if he plays a real instrument, he does nothing but imitate an existing song. In most such other music games, there is hardly any real gameplay: just points based on how well you played the song.

Cello Fortress is a completely different affair: here the cellist is controlling a real game, with real choice and interaction. Depending on what his opponents do, the cellist plays different notes. The cellist can even do things like baiting the opponents with a certain attack and then switching to another.

So how does that work? What does the cellist need to do to trigger the various attacks? Check this trailer to see (and hear!) how it works:

Live video footage in the trailer shot by Zoomin.tv Games at the Indie Games Concert.

Here is an overview of the attacks as explained in the trailer:

Slow high notes: long range guns
Slow chords: homing missiles
Fast high notes: machine guns*
Fast chords: double machine guns*
Dissonant chords: flamethrowers
Slow low notes: create mines
Fast low notes: mines move towards the player
Special melody 1: obliterate left half of screen
Special melody 2: obliterate right half of screen

*Playing even faster notes increases the speed of the machine guns.

The key thing to realise, is that the first seven of these attacks allow the cellist to play many different styles, melodies and rhythms, and still achieve that attack. The number of possibilities with "slow high notes" is literally infinite. This is a crucial aspect to the game, since it allows the cellist to improvise in many different ways, keeping each match of Cello Fortress fresh and varied. Having so much freedom also allows an experienced cellist to play fluently from one attack to the other.

There is real gameplay and choice in this. For example, something I often do when playing the cello in Cello Fortress, is play something slow to dare players to get close to my cannons. As soon as they do, I switch to fast chords to damage them from short range.

The special melodies are each 8 notes and have been defined beforehand. The fun in these is that the attack is announced when the 4th note is played, but the damage is not actually done until the 8th note is played. Players who pay close attention can hear the attack coming after only two notes, and thus flee before it even happens.

I can play the melody faster or slower to make the attack happen earlier or later. From a gameplay perspective, one would assume I always attack as quickly as possible, but my goal is actually not purely to win: I want to entertain the players and the audience. So I sometimes deliberately let them live to give them a more fun experience. This can be seen around 1:33 in the trailer: I make the final note very long to allow that player to escape. Just like in a film, the best moments are not when the hero dies, but when he narrowly escapes.

These controls were specifically chosen because they combine music and control in a natural way. Achieving this was more difficult than it may seem. In my very first prototype, the cello simply shot one bullet for every note, and the direction of the bullet depended on the pitch of the note. This turned out to play horribly: whenever the players moved from the left to the right, the cellist had to play a scale from low to high. When they moved back, the notes also had to go back from high to low. This made it completely impossible to play anything that sounded like good music.

Another thing I tweaked a lot is the mapping of which pattern triggers which attack. The current controls work quite well on an emotional level: the attack is linked to the feeling of the music. Slow, low notes often sound quite tense and sad on a cello (especially with the specific types of melodies I personally usually play), and alternating between slow and fast notes creates an awesomely menacing atmosphere. This can be seen in the trailer from around 1:00. Creating tension this way works incredibly well: I performed with Cello Fortress in front of an audience of several hundred people at the Indie Games Concert, and the noises from the audience made it clear that they experienced the tension very strongly.

A note I should make on this trailer, is that in the real game, there is a slight delay between the music the cello plays and the moment the guns react to it. This is because analysing music in real-time takes a bit of time. To make the trailer more understandable, I have moved the sound a bit to make the music fit the gameplay exactly.

While I am already performing with it, I am also still working on Cello Fortress to improve it. So what is next? My focus for the coming period is first creating real graphics, and after that I want to add a couple more attacks for the cellist. In the meanwhile, I hope more events, venues and exhibits will contact me to perform with Cello Fortress! Check www.cellofortress.com for tour dates and contact info!