LAME Official Logo

A Free Audio Compression Format?

Related Links:

Based on what I've learned working with LAME and GPSYCHO, I believe it would not be too difficult to develop an independent audio codec of slightly better quality than LAME, (and thus comparable to the best commercial MP3 codecs). Yes, it wont be as good as AAC, but think of this: how many people would use a proprietary compression code which was slightly better than gzip?

The following is an outline of such an audio codec. It is based on general published ideas which form the basis of several encoders (MP3, MPEG4-AAC, AT&T PAC). It includes most of the ideas that were used by the MP3 format, but removes many of the components in MP3 which are inherited from layer I and layer II. I have also added some newer ideas that were utilized by AAC and PAC. Some of the more sophisticated ideas such as temporal noise shaping and predictive coding are not included.

The problem with all of this: many of these fundamental ideas are patented in countries which allow patents on algorithms. There is still a difference between ideas and algorithms, so it may be possible to implement this codec using different algorithms for the same ideas. It will require a significant amount of legal work to make this determination. If your beliefs do not coincide with the patent holder's beliefs, you could be sued and the courts will decide. If you dont have the money for such a law suit, then that is the end of the project!

Just a cursory patent search will yield dozens of patents on every aspect of audio compression. Below I have referenced some of these patents along with my uninformed interpretation of what they claim.

Frame/Window types

The very concept of using spectral transforms applied to frames of PCM samples seems to be patented (US5579430). But I believe spectral transforms (or filterbanks) must be used because psycho acousticinformation is given in terms of spectral coefficients (the frequency domain). The majority of audio compression comes from allocating bits between different frequency bands based on psycho acoustic information.

The concept of window switching to reduce pre-echo effects is patented in US5285498.

Critical Bands

Critical bands are a way to group frequency bands which better mimics the response of the human ear. The concept is old, but there may be patents on the use of critical bands for audio compression. The concept of mid/side encoding is patented in US5481614.

Quantization of MDCT coefficients

The use of scale factors to control the allocation of bits between scale factor bands is patented. Even worse, just the concept of allocating bits among critical bands based on any set of external requirements is patented (US5579430).

Lossless compression of quantized MDCT coefficients

Some type of lossless compression and encoding of quantized data. MP3 uses Huffman coding with precomputed tables each assigned a unique code.

The type of Huffman coding used in MP3 is patented (US5579430). Are there other types of Huffman coding which we could use? Is the concept of precomputed tables patentable? Or are just the tables themselves patented? A version of the algorith in gzip, optimized for audio frames would probably be the best. But just the very fact of using optimized encoding is claimed to be patented! (US5579430).

Psycho-acoustic model (output used during quantization step)

The algebraic formulas for these quantities are in the published literature. The tonality formula is patented in US5040217.