1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE

Over the last few years, researchers have proposed several efficient signal models (e.g., transform-based, subband-filter structures, wavelet-packet) and compression standards (Table 1.1) for high-quality digital audio reproduction. Most of these algorithms are based on the generic architecture shown in Figure 1.1.

The coders typically segment input signals into quasi-stationary frames ranging from 2 to 50 ms. Then, a time-frequency analysis section estimates the temporal and spectral components of each frame. The time-frequency mapping is usually matched to the analysis properties of the human auditory system. Either way, the ultimate objective is to extract from the input audio a set of time-frequency parameters that is amenable to quantization according to a perceptual distortion metric. Depending on the overall design objectives, the time-frequency analysis section usually contains one of the following:

  • Unitary transform
  • Time-invariant bank of critically sampled, uniform/nonuniform bandpass filters

    image

    Figure 1.1. A generic perceptual audio encoder.

  • Time-varying (signal-adaptive) bank of critically sampled, uniform/nonuniform bandpass filters
  • Harmonic/sinusoidal analyzer
  • Source-system analysis (LPC and multipulse excitation)
  • Hybrid versions of the above.

The choice of time-frequency analysis methodology always involves a fundamental ...

Get Audio Signal Processing and Coding now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.