9.1 Introduction

In a conventional audio production and transmission chain, the audio content is first produced for playback using a certain reproduction system (for example two-loudspeaker stereophony), and is subsequently encoded, transmitted or stored, and decoded. The specific order of production and encoding/decoding makes it very difficult to enable user interactivity to modify the ‘mix’ produced by, for example, a studio engineer.

There are however several applications that may benefit from user control in mixing and rendering parameters. For example, in a teleconferencing application, individual users may want to control the spatial position and the loudness of each individual talker. For radio and television broadcasts, users may want to enhance the level of a voice-over for maximum speech intelligibility. Younger people may want to make a ‘re-mix’ of a music track they recently acquired, with control of various instruments and vocals present in the mix.

Conventional ‘object-based’ audio systems require storage/transmission of the audio sources such that they can be mixed at the decoder side as desired. Also wave field synthesis systems are often driven with audio source signals. ISO/IEC MPEG-4 [139, 146, 230] addresses a general object-based coding scenario. It defines the scene description (= mixing parameters) and uses for each (‘natural’) source signal a separate mono audio coder. However, when a complex scene with many sources is to be coded the bitrate becomes high ...

Get Spatial Audio Processing: MPEG Surround and Other Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.