As mentioned above, the scheme for joint-coding of audio source signals, shown in Figure 9.2, is based on transmission of the sum of the audio source signals,
where M is the number of source signals and si(n) are the individual source signals.
Similar to spatial audio coding techniques, this method relies on the assumption that the perceived auditory spatial image is largely determined by the inter-channel time difference (ICTD), inter-channel level difference (ICLD), and inter-channel coherence (ICC) between the rendered audio channels. Therefore, as opposed to requiring ‘clean’ source signals si(n) as mixer input in Figure 9.1, only signals ŝi(n) are required that result in similar ICTD, ICLD, and ICC at the mixer output as for the case of supplying the real source signals si(n) to the mixer. There are three goals for the generation of ŝi(n):