A Free Audio Compression Format?

Related Links:

The MUS420 project.
Ogg Vorbis: A new, completely free format from the creators of cdparanoia.
The FAAC project: An open source AAC implementation.

Based on what I've learned working with LAME and GPSYCHO, I believe it would not be too difficult to develop an independent audio codec of slightly better quality than LAME, (and thus comparable to the best commercial MP3 codecs). Yes, it wont be as good as AAC, but think of this: how many people would use a proprietary compression code which was slightly better than gzip?

The following is an outline of such an audio codec. It is based on general published ideas which form the basis of several encoders (MP3, MPEG4-AAC, AT&T PAC). It includes most of the ideas that were used by the MP3 format, but removes many of the components in MP3 which are inherited from layer I and layer II. I have also added some newer ideas that were utilized by AAC and PAC. Some of the more sophisticated ideas such as temporal noise shaping and predictive coding are not included.

The problem with all of this: many of these fundamental ideas are patented in countries which allow patents on algorithms. There is still a difference between ideas and algorithms, so it may be possible to implement this codec using different algorithms for the same ideas. It will require a significant amount of legal work to make this determination. If your beliefs do not coincide with the patent holder's beliefs, you could be sued and the courts will decide. If you dont have the money for such a law suit, then that is the end of the project!

Just a cursory patent search will yield dozens of patents on every aspect of audio compression. Below I have referenced some of these patents along with my uninformed interpretation of what they claim.

Frame/Window types

1024 and 128 (for pre-echo) sample windows. MP3 uses 576 & 192 sample windows. AAC uses 1024 and 128 sample windows. (Brandenburg & Stoll 1994, Bosi et al. 1997 in References).
Spectral coefficients computed from overlapping MDCT coefficients (lossless). MP3 and AAC apply the MDCT only after first splitting the signal into frequency bands with windowed filterbanks.
Pre-echo detection from the GPSYCHO algorithm. The GPSYCHO pre-echo detection algorithm is truely original, although it is such a simple concept that I'm sure someone has patented it.

The very concept of using spectral transforms applied to frames of PCM samples seems to be patented (US5579430). But I believe spectral transforms (or filterbanks) must be used because psycho acousticinformation is given in terms of spectral coefficients (the frequency domain). The majority of audio compression comes from allocating bits between different frequency bands based on psycho acoustic information.

The concept of window switching to reduce pre-echo effects is patented in US5285498.

Critical Bands

Group coefficients in critical bands. MP3 uses 21 for long windows, 12 for short. AAC uses 49 for long windows, 14 for short.
Allow option of mid/side encoding for each critical band. MP3 does not allow mid/side encoding on a band by band basis. AT&T PAC does. (Johnston & Ferrera 1992 References).

Critical bands are a way to group frequency bands which better mimics the response of the human ear. The concept is old, but there may be patents on the use of critical bands for audio compression. The concept of mid/side encoding is patented in US5481614.

Quantization of MDCT coefficients

Associated to each critical band is a scale factor. The larger the scale factor, the more bits allocated to this critical bands.
Truncate MDCT coefficients *scalefactor to integers. This is all that is meant by Quantization.
Choose scale factors so quantization distortion in each critical band is less than the masking computed by the psycho-acoustic model.
If more compression is desired (with some distortion) choose scale factors with GPSYCHO algorithm. Compression can be controlled to produce a given bitrate, or given quality.

The use of scale factors to control the allocation of bits between scale factor bands is patented. Even worse, just the concept of allocating bits among critical bands based on any set of external requirements is patented (US5579430).

Lossless compression of quantized MDCT coefficients

Some type of lossless compression and encoding of quantized data. MP3 uses Huffman coding with precomputed tables each assigned a unique code.

The type of Huffman coding used in MP3 is patented (US5579430). Are there other types of Huffman coding which we could use? Is the concept of precomputed tables patentable? Or are just the tables themselves patented? A version of the algorith in gzip, optimized for audio frames would probably be the best. But just the very fact of using optimized encoding is claimed to be patented! (US5579430).

Psycho-acoustic model (output used during quantization step)

Masking given by a linear function expressed in critical bands.
Strength of masking given from tonality of signal.
Tonality estimated by a measure of the predictability of the signal.
Johnson (1988) and Brandenburg & Johnston (1990), References

The algebraic formulas for these quantities are in the published literature. The tonality formula is patented in US5040217.