MIDI Music

After the deep dream article I wrote, I decided to play with my two other favourite things: MIDI and POVRay. MIDI, or Musical Instrument Digital Interface, is a way of encoding music. Basically, a MIDI file (typical extension of .mid) is a multi-track “recording” of a piece of music. Each track consists of a number of events, with each event being separated by a certain amount of time from each other event (the separation could be zero for simultaneous events). POVRay, on the other hand, is a multi-threaded ray tracer. A ray tracer allows you to define a “scene” containing objects, and then the ray tracer renders it into a JPEG file.

So you can imagine what I did. I combined Bach's Toccata and Fugue in D Minor with a bunch of ray traced cylinders, and came up with this “beauty:”

YouTube animation of Toccata and Fugue in D Minor
ThumbnailRuntimeDescription
09:06Toccata and Fugue in D Minor by J S Bach

It's neither awesome graphics (I used a boring background with cylinders of two different colours) nor awesome music synthesis (I used a square wave with no envelope), but it is awesomely synchronized. Technically, what you are looking at is a histogram of the remaining notes in the composition — the height of each cylinder represents the total number of remaining notes. When a note is played, the height of its corresponding cylinder decreases (and the cylinder turns red momentarily so you can follow along).

The key piece to “trying this at home” is the following mencoder line:

mencoder mf://@tandf.files -mf fps=30 -ofps 30 -mc 0 -vf harddup -o tandf.avi -ovc x264 -audiofile tandf.wav -oac mp3lame

This basically says to create an output file tandf.avi at 30 frames per second. The combination of audio and video is where the magic comes in. In mencoder, you can specify a list of still images, and an audio track, and mencoder will combine them into one gloriuos output file. (I used this technique to create the modern version of the QNX Training Courseware.)

Video

The video files are specified by the part mf://@tandf.files — this says “fetch the list of files from the file tandf.files and put them into the output in the same order as specified in the file.”

The tandf.files file contains the following:

tandf-000276.png
tandf-000276.png
tandf-000276.png
tandf-000276.png
tandf-000276.png
tandf-000281.png
tandf-000281.png
tandf-000281.png
tandf-000281.png
tandf-000285.png
tandf-000285.png
tandf-000285.png
tandf-000285.png
tandf-000285.png
tandf-000285.png
tandf-000285.png
tandf-000285.png
tandf-000285.png
tandf-000285.png
tandf-000285.png
...

There are 16,375 lines in the tandf.files file, so I didn't list them all.

Notice that there are a lot of duplicates! That's because the same picture (e.g., tandf-000276.png) shows up over multiple frames. Since we're showing 30 frames per second, it follows that each picture represents 1/30th of a second. From that, you can see that tandf-000276.png appears 5 times — so it represents 5 x 30ths of a second or 166.666 milliseconds. The next image, tandf-000281.png shows up 4 times, so represents 4 x 30ths of a second (or 133.333 milliseconds), and so on.

At some point I might write an article about my software MIDI synthesizer (rose2, now in development) but suffice it to say that it reads MIDI files. The present program, midipov, also reads the same MIDI files, so that's how it knew which frames to put where.

Let me put that another way.

In the Toccata and Fugue, the very first note is pressed 9.2 seconds into the recording (I don't know why they left 9.2 seconds of time before pressing a note, I'm not a musician, LOL!).

9.2 seconds is 276 x 30ths of a second, which is why the first image is called tandf-000276.png — the 276 part represents the frame number. Same with the tandf-000281.png file — it represents an event that happened 281 x 30ths of a second after the start of the recording.

By now you've guessed why there are 16,375 lines in the tandf.files file — I need to tell mencoder what picture to display for each and every single frame of the 9 minute movie.

Audio

The audio part is rendered by rose2 — it takes MIDI files as input and simulates all of the instruments in the virtual orchestra, performs post-processing for echo and room effects, and writes a WAVE output file.

The Foundation

So, in spite of the cheap graphics and the crappy square wave synthesis, what this does is sets up a framework for me. I can now take a MIDI input file, and split it out into two streams — an audio stream that is processed by rose2, and a video stream that is processed by midipov. midipov doesn't do much actual “processing,” instead it writes out command files for the POVRay ray tracer.

Here's what one of those files looks like:

#include "colors.inc"
background { color Cyan }
camera {
    location <50, 30, -90>
    look_at <50, 40, 0>
}
cylinder { <26, 0, 0>, <26, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <31, 0, 0>, <31, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <34, 0, 0>, <34, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <38, 0, 0>, <38, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <43, 0, 0>, <43, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <46, 0, 0>, <46, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <50, 0, 0>, <50, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <53, 0, 0>, <53, 0.626959, 0>, 0.4 pigment { Blue } }
cylinder { <55, 0, 0>, <55, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <57, 0, 0>, <57, 0.31348, 0>, 0.4 pigment { Red } }
cylinder { <58, 0, 0>, <58, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <60, 0, 0>, <60, 0.31348, 0>, 0.4 pigment { Blue } }
cylinder { <62, 0, 0>, <62, 0.626959, 0>, 0.4 pigment { Blue } }
light_source { < 64, 4, -3> color White }

Consult the official POVRay site to understand exactly what the various commands mean. As a quick summary, though, this is a late stage file (tandf-016449.pov in fact) which has 13 cylinders (“notes”) left to play. Doing the math (on the filename), 16449 x 30ths of a second is ten frames into 09:08 in the video.

If I was a better artist (let me rephrase that: if I had any artistic talent whatsoever), I'd be in a position to generate a much more creative landscape and do much more interesting animations.

But, as with all things, it's a start.

So where do I want to go from here? The stuff that Animusic does:

is way out of my league (due to, in order of importance, artistic talent, implementation time, and CPU budget), but that doesn't bother me — I'm mainly interested in the “how” — and this lets me explore that world.

My next steps are to finish up the rewrite of rose2 (the C++ rewrite of the C-based rose, “Rob's Own Synthesizer Emulator”).