Audio and Music Compression
A few short years ago,
mentioning audio compression to most people would have resulted in a
blank stare at best or a bad joke about flattening cassette tapes at
worst. A song was a song, and the amount of space it took up on tape
or CD was dictated by nothing more than how long it was. Long songs
take up lots of space, as simple as that.
With the massive explosion of the Internet as a content
distribution channel and the invention of downloadable digital
music, the world changed forever. Thanks to portables, DVD players
and even mobile phones, your technophobe neighbor is likely to
understand what you mean – especially if you drop the word MP3 into
the conversation.
The Internet is a great way to get music. You can download
officially sanctioned tracks from all sorts of places, without the
hassle of going to the shops or waiting for a CD to come in the
post. Outlets such as Apple’s Music Store are a massive success for
both Windows and Mac users, and Napster has just relaunched as a
legitimate service. The snag is obvious, though – a CD-quality audio
track is huge. Even with a fat broadband link, downloading a single
song takes too long.
Shrink that data
This is where audio compression comes in. Compression is simply a method of
reducing the size of digital information to make transmission a
shorter process. Just like video compression, audio can be crunched
in one of two ways: lossless or lossy. Lossless compression works in
much the same way as the compression algorithms used in your
favourite ZIP package. The compressor searches the bits that
represent the audio data looking for repeated patterns. When a
pattern is found, it’s removed and replaced with a much shorter key
that represents the pattern. Find enough patterns, and you can
reduce the file size by a large amount. When Windows decompresses
the audio, it spots these keys, looks them up in a table and
replaces them with the original information.
Lossless compression is great for music, as it preserves the
fidelity of the song since no information is discarded. The snag is
that while systems like Huffman encoding work well for normal files,
complex musical data doesn’t contain much repetition and so doesn’t
compress too well. There are some clever lossless codecs about,
though – read ‘Save your songs’ on the right for more information.
To make audio compression really practical, we need to be prepared
to throw certain information away, sacrificing a slight amount of
quality for a massive reduction in size. This means songs can
download very quickly, and we can fit more music on to limited
devices like portable players and phones. As you couldn’t fail to
know by now, the current king of audio compression systems is MP3,
which actually means MPEG-1 Layer-3. MP3 is a massively successful
format, and while others have appeared which offer better quality at
even smaller file sizes – notably Microsoft’s Windows Media Audio 9
and the AAC Advanced Audio Coding used on some portables – MP3 is
the most popular. To understand how MP3 compression works within
Windows, we need to take a look at how the process judges what is
safe to discard.
Virtual ears
Lossy music compression uses what is called psychoacoustic modelling.
The theory here is that the human ear can’t actually hear every
sound a song actually contains, so space is wasted. When Windows is
compressing a song, it first needs to examine the information and
compare it to a mathematical model that predicts what the ear can
pick up. Any information that falls outside of that range is fair
game to be thrown out – simplifying the file, meaning it is much
easier to compress. Windows does this by using a separate codec,
just as with video compression. A codec is simply a separate piece
of software that bolts on to Windows, adding support for a
particular audio or video format. Modern versions of Windows come
with an MP3 codec, but others are available as shareware and
freeware as well as fully commercial offerings.
Stages of compression
The compression process can be broken down into stages, which don’t
always necessarily occur in the same order. First, the info is
broken down into separate chunks, called frames. It’s easier for
Windows to deal with a small portion of sound data at a time, and it
makes compression more efficient. Once the audio is divided into
frames, Windows will perform a spectral analysis to see what
frequencies a song contains. Next, the information is compared to
the information held by the psychoacoustic models. Some codecs have
more complex models than others, which is why not all MP3 files
sound the same. The better the model, the less processed the sound
of the result.
Once any data which falls outside of the psychoacoustic model has
been discarded, Windows will use the correct amount of bits to
encode the remaining data, depending on the bitrate you chose to
encode the file. The higher the bitrate, the better the sound.
Finally, the codec will run the frames through a traditional
lossless Huffman compression system, reducing the file size a little
further. If you’ve ever wondered why zipping an MP3 doesn’t work,
this is why – it’s already as compressed as it can be.
Psychoacoustic modelling is part of what is known as perceptual
encoding, the idea being that it’s safe to bin information we’re not
supposed to be able to hear. The practical result though is that
some of us can always tell the difference between an MP3 and the
original track. By the very nature of the codec, some information
that can never be replaced is discarded, and many real audiophiles
can spot an MP3 track no matter how high the bitrate. For true sonic
fidelity, the only real way to go is with lossless compression. With
broadband speeds increasing and the price of storage for portables
dropping, we can all look forward to high-quality lossless audio in
the future.
Constant vs variable
An audio file is comprised from a number of bits that represent the
actual audio information. The general rule is that the higher the
bitrate, the more faithful to the original sound the compressed
version will be. Listen to almost any song and you’ll hear that
generally the sound levels vary throughout the track. Some parts of
a song will be very quiet while others are loud, and there may be
relatively silent parts compared to rich, complex sections.
If you encode audio with a constant bitrate, the same amount of bits
are used for both simple and complex sections, leading to a large
waste of bits – and hence, a bigger file size than is really
necessary. A technique known as variable bitrate encoding or VBR
solves this by examining the song before compression begins,
flagging areas which requires more or less bits than average. Each
frame of audio can therefore be given its own dedicated bitrate,
generally leading to a much more faithful reproduction without
compromising file size. With VBR encoding, you’ll hear the
difference much more with music that has light and dark, such as
certain classical tunes.
More on Lossless Compression
We briefly mentioned lossless audio compression in the main feature.
With disc and memory storage falling all the time, the idea of
conserving space becomes less of an issue – which of course means we
can concentrate on quality.
Raw, uncompressed digital audio stored at CD quality takes up a lot
of disk space. A 650MB CD-R will hold up to 74 minutes of music –
multiply that figure by how many CDs you have and you’re looking at
serious storage requirements. Archiving using a lossless codec takes
up more space than something like MP3, but much less than
uncompressed audio.
The most recent version of Windows Media Player comes with the
series 9 codecs, which includes the new Windows Lossless Audio
compressor. Set this as the format of your choice in Media Player
and you can expect each CD to take around half the space of the
original with no quality loss. Microsoft’s isn’t the only solution,
either. Others include Monkey’s Audio (www.monkeysaudio.com) which
is only for Windows, and FLAC the Free Lossless Audio Codec (flac .sourceforge.net)
which is available for Linux and others.
This article orignally appeared in
PC Answers
August 2004
|