In 1987, the Fraunhofer IIS-A started to work on the EUREKA project EU147, Digital Audio Broadcasting (DAB). In a joint cooperation with the University of Erlangen (Prof. Dieter Seitzer), the Fraunhofer IIS-A finally devised a very powerful algorithm that is standardized as ISO-MPEG Audio Layer-3 (IS 11172-3 and IS 13818-3), which is more commonly known as mp3.
Without data reduction, digital audio signals typically consist of 16 bit samples recorded at a sampling rate more than twice the actual audio bandwidth (e.g. 44.1 kHz for Compact Disks). So you end up with more than 1.400 Mbit to represent just one second of stereo music in CD quality. By using MPEG audio coding, the original sound data can be compressed from a CD by a factor of 12, without losing sound quality. Factors of 24 and even more still maintain a sound quality that is significantly better than what you get by just reducing the sampling rate and the resolution of your sample.
This is realized by perceptual coding techniques addressing the perception of sound waves by the human ear, for example during strong sounds, you do not hear the weakest sounds. It is therefore not necessary to code all the sounds, as weaker sounds will not be heard as they are masked. This is the first property used by the MP3 format to earn some space. For this the MP3 encoder uses a psychoacoustic model modeling the behavior of the human ear.
Using MPEG audio, one may achieve a typical data reduction of
still maintaining the original CD sound quality.
By exploiting stereo effects and by limiting the audio bandwidth, the coding schemes may achieve an acceptable sound quality at even lower bitrates. MPEG Layer-3 (mp3) is the most powerful member of the MPEG audio coding family. For a given sound quality level, it requires the lowest bitrate - or for a given bitrate, it achieves the highest sound quality.
Some typical performance data of MPEG Layer-3 are:
The filter bank used in MPEG Layer-3 is a hybrid filter bank which consists of a
polyphase filter bank and a Modified
The perceptual model is mainly determining the quality of a given encoder implementation. It uses either a separate filter bank or combines the calculation of energy values (for the masking calculations) and the main filter bank. The output of the perceptual model consists of values for the masking threshold or the allowed noise for each coder partition. If the quantization noise can be kept below the masking threshold, then the compression results should be indistinguishable from the original signal.
The reservoir of bytes
Often, some passages of a musical piece can not be coded to a given rate without altering the musical quality. The MP3 then uses then a short reservoir of bytes that acts as a buffer by using capacity from passages that can be coded to an inferior rate in the given flow.
The minimal audition threshold
The minimal audition threshold of the human ear is not linear. The human ear responds to frequencies between 2Khz and 5Khz. It is not therefore necessary to code sounds situated under or above this threshold, because they will not be audible.
The Joint Stereo
In the case of a stereophonic signal, the MP3 format can then use a few more tools, reffered as Joint Stereo (JS) coding, to furthershrink the compressed file size.
In many mid-range Hi-fi sets , there is a unique subwoofer. However you usually do not have the feeling that the sound comes from this boomer, but rather from satellite speakers. Indeed for very low and very high frequencies, the human ear is no longer able to locate the spacial origin of sounds with full accuracy. The mp3 format can therefore (optionally) revert to such a trick byusing what is called Intensity Stereo (IS). Some frequencies are then recorded as a monophonic signal followed by a few additional information in order to restore a minimum of spatialisation.
The second joint stereo tool is called Mid/Side (M/S) stereo. When the left and the right channels are quite similar, then a middle(L+R) and a side (L-R) channels are encoded instead of left and right. This allows to reduce the final file size by using less bits for the side channel. During playback, the MP3 decoder will reconstruct the left and right channels.
Quantization and Coding
A system of two nested iteration loops is the common solution for quantization and coding in a Layer-3 encoder.
Quantization is done via a power-law quantizer. In this way, larger values are automatically coded with less accuracy and some noise shaping is already built into the quantization process.
The quantized values are coded by Huffman coding. As a specific method for entropy coding, huffman coding is lossless. Thus is called noiseless coding because no noise is added to the audio signal.
The process to find the optimum gain and scalefactors for a given block, bit-rate and output from the perceptual model is usually done by two nested iteration loops in an analysis-by-synthesis way:
Inner iteration loop (rate loop)
The Huffman code tables assign shorter code words to (more frequent) smaller quantized
values. If the number of bits resulting from the coding operation exceeds the number of
bits available to code a given block of data, this can be corrected by adjusting the
global gain to result in a larger quantization step size, leading to smaller quantized
This operation is repeated with different quantization step sizes until the resulting bit demand for Huffman coding is small enough. The loop is called rate loop because it modifies the overall coder rate until it is small enough.
Outer iteration loop (noise control/distortion loop)
To shape the quantization noise according to the masking threshold, scalefactors are
applied to each scalefactor band. The systems starts with a default factor of 1.0 for each
band. If the quantization noise in a given band is found to exceed the masking threshold
(allowed noise) as supplied by the perceptual model, the scalefactor for this band is
adjusted to reduce the quantization noise. Since achieving a smaller quantization noise
requires a larger number of quantization steps and thus a higher bitrate, the rate
adjustment loop has to be repeated every time new scalefactors are used. In other words,
the rate loop is nested within the noise control loop. The outer (noise control) loop is
executed until the actual noise (computed from the difference of the original spectral
values minus the quantized spectral values is below the masking threshold for every
scalefactor band (i.e. critical band).