GPSYCHO - Mid/Side Stereo
There are two stereo modes in GPsycho: "stereo" and "jstereo". "stereo" is just the normal independent coding of left & right channels. "jstereo" means the frames may use normal stereo, stereo, mid/side stereo or intensity stereo. FhG only seems to use intensity stereo at bitrates of 80kbs and lower. LAME does not have intensity stereo capability. In jstereo mode, the encoder has to decide for each frame if it should be encoded stereo or mid/side stereo.
Mid/side stereo encodes the mid and side channels instead of left and right. It allocates more bits to the mid channel than the side channel. For signals without a lot of stereo seperation, there will be very little information in the side channel and this trick will improve bandwidth. If the left & right channels differ by a lot, then the side channel will contain a lot of information. Errors encoding this information will show up as noise in both the left and right channels after decoding.
The LAME mid/side switching criterion, and mid/side masking thresholds are taken from Johnston and Ferreira, Sum-Difference Stereo Transform Coding, Proc. IEEE ICASSP (1992) p 569-571.
The MPEG AAC standard claims to use mid/side encoding based on this paper.
LAME Mid/Side switching criterion
The new ms_stereo switch uses mid/side stereo only when the difference between L & R masking thresholds (averaged over all scalefactors) is less then 5db. In several test samples it does an amazing job mimicking the FhG encoder (see below).
I believe the idea behind this is the following: If one channel has much less noise masking in a certain band, than masked noise in one channel that is spread to the other channel (by mid/side stereo) may no longer be masked. If both channels have the same masking, then the noise spread between both channels will be equally well masked.
regular stereo frames: Fools.wav: (1180 frames) FhG frames 793-804,902 new LAME frames 793-803,869,902,966,1017 old LAME over 500 frames used regular stereo IfYouCould.wav: (80 frames) FhG 43,51,60 new LAME 42,43,51,60 (like FhG, 1 extra) old LAME 33,62,65,66 (completely unlike FhG) mstest.wav: (156 frames) FhG: 138 frames use regular stereo new LAME 137 frames use regular stereo old LAME 8 frames use regular stereo t1.wav: (160 frames) FhG: 39-42, 80-83, 121-124, 144-150 new LAME: 38-41, 79-82, 120-124, old LAME: constant inappropriate toggling of ms_stereo track7.wav (146 frames) FhG: 0, 2-15, 21-66, 69-80, 83-146 new LAME: Castanets.wav: (253 frames) All encoders use all ms_stereo for all frames else3.wav: 217 frames All encoders use all ms_stereo for all frames
Mid/Side Stereo Masking Thresholds
There is a problem for true jstereo, where you need to turn ms_stereo on and
off on a frame by frame basis. Some frames will need masking thresholds from
L/R channels, and some for Mid/Side channels. But since the masking thresholds
depend on previous (and following) frames, you can only compute the masking
for a given granule if you've computed it for the 2 previous granules. Thus to
implement Mid/Side masking into the jstereo mode, we would need to compute, for
all frames, L,R, Mid and Side masking thresholds in l3psycho_anal. This would
not be as expensive as it sounds since the FFTs only need to be called on the
L & R channels. The energy and phase from Mid & Side channels can be
computed f rom the L & R FFT output. But it would be a major code change.
(Note: this is now done in LAME 3.21 with the -h option. It will
eventually become the default).
What's done right now? Without the -h option, LAME jstereo only computes L & R masking thresholds. If it is encoding a non ms_stereo frame, no problem. If it is encoding Mid & Side channels, then we have to be a little careful. We are quantizing Mid/Side channels, but the masking (allowed distortion) is given on L & R channels. Thus the computation of the audible distortion has to be done on the L & R channels too. This just involves reforming the L/R MDCT coefficients and the de-quantized L/R coefficients, and is done in calc_noise2.