You are previewing The H.264 Advanced Video Compression Standard, Second Edition.
O'Reilly logo
The H.264 Advanced Video Compression Standard, Second Edition

Book Description

H.264 Advanced Video Coding or MPEG-4 Part 10 is fundamental to a growing range of markets such as high definition broadcasting, internet video sharing, mobile video and digital surveillance. This book reflects the growing importance and implementation of H.264 video technology. Offering a detailed overview of the system, it explains the syntax, tools and features of H.264 and equips readers with practical advice on how to get the most out of the standard.

  • Packed with clear examples and illustrations to explain H.264 technology in an accessible and practical way.

  • Covers basic video coding concepts, video formats and visual quality.

  • Explains how to measure and optimise the performance of H.264 and how to balance bitrate, computation and video quality.

  • Analyses recent work on scalable and multi-view versions of H.264, case studies of H.264 codecs and new technological developments such as the popular High Profile extensions.

  • An invaluable companion for developers, broadcasters, system integrators, academics and students who want to master this burgeoning state-of-the-art technology.

"[This book] unravels the mysteries behind the latest H.264 standard and delves deeper into each of the operations in the codec. The reader can implement (simulate, design, evaluate, optimize) the codec with all profiles and levels. The book ends with extensions and directions (such as SVC and MVC) for further research." Professor K. R. Rao, The University of Texas at Arlington, co-inventor of the Discrete Cosine Transform

Table of Contents

  1. Copyright
  2. About the Author
  3. Preface
  4. Glossary
  5. 1. Introduction
    1. 1.1. A change of scene
    2. 1.2. Driving the change
    3. 1.3. The role of standards
    4. 1.4. Why H.264 Advanced Video Coding is important
    5. 1.5. About this book
    6. Reference
  6. 2. Video formats and quality
    1. 2.1. Introduction
    2. 2.2. Natural video scenes
    3. 2.3. Capture
      1. 2.3.1. Spatial sampling
      2. 2.3.2. Temporal sampling
      3. 2.3.3. Frames and fields
    4. 2.4. Colour spaces
      1. 2.4.1. RGB
      2. 2.4.2. YCrCb
      3. 2.4.3. YCrCb sampling formats
    5. 2.5. Video formats
      1. 2.5.1. Intermediate formats
      2. 2.5.2. Standard Definition
      3. 2.5.3. High Definition
    6. 2.6. Quality
      1. 2.6.1. Subjective quality measurement
        1. 2.6.1.1. Factors influencing subjective quality
        2. 2.6.1.2. ITU-R 500
      2. 2.6.2. Objective quality measurement
        1. 2.6.2.1. PSNR
        2. 2.6.2.2. Other objective quality metrics
    7. 2.7. Summary
    8. References
  7. 3. Video coding concepts
    1. 3.1. Introduction
    2. 3.2. Video CODEC
    3. 3.3. Prediction model
      1. 3.3.1. Temporal prediction
        1. 3.3.1.1. Prediction from the previous video frame
        2. 3.3.1.2. Changes due to motion
        3. 3.3.1.3. Block-based motion estimation and compensation
        4. 3.3.1.4. Motion compensated prediction of a macroblock
          1. 3.3.1.4.1. Motion estimation:
          2. 3.3.1.4.2. Motion compensation:
        5. 3.3.1.5. Motion compensation block size
        6. 3.3.1.6. Sub-pixel motion compensation
      2. 3.3.2. Spatial model: intra prediction
    4. 3.4. Image model
      1. 3.4.1. Predictive image coding
      2. 3.4.2. Transform coding
        1. 3.4.2.1. Overview
        2. 3.4.2.2. DCT
        3. 3.4.2.3. Wavelet
      3. 3.4.3. Quantization
        1. 3.4.3.1. Scalar quantization
        2. 3.4.3.2. Vector quantization
      4. 3.4.4. Reordering and zero encoding
        1. 3.4.4.1. DCT
          1. 3.4.4.1.1. Coefficient distribution
          2. 3.4.4.1.2. Scan
          3. 3.4.4.1.3. Run-Level Encoding
        2. 3.4.4.2. Wavelet
          1. 3.4.4.2.1. Coefficient distribution
          2. 3.4.4.2.2. Zerotree encoding
    5. 3.5. Entropy coder
      1. 3.5.1. Predictive coding
      2. 3.5.2. Variable-length coding
        1. 3.5.2.1. Huffman coding
          1. 3.5.2.1.1. Example 1: Huffman coding, Sequence 1 motion vectors
          2. 3.5.2.1.2. Example 2: Huffman coding, sequence 2 motion vectors
        2. 3.5.2.2. Pre-calculated Huffman-based coding
          1. 3.5.2.2.1. Transform Coefficients (TCOEF)
          2. 3.5.2.2.2. Motion Vector Difference (MVD)
        3. 3.5.2.3. Other variable-length codes
      3. 3.5.3. Arithmetic coding
        1. 3.5.3.1.
          1. 3.5.3.1.1. Decoding procedure
        2. 3.5.3.2. Context-based Arithmetic Coding
    6. 3.6. The hybrid DPCM/DCT video CODEC model
      1. 3.6.1.
        1. 3.6.1.1.
          1. 3.6.1.1.1. Encoder data flow
          2. 3.6.1.1.2. Decoder data flow
    7. 3.7. Summary
    8. References
  8. 4. What is H.264?
    1. 4.1. Introduction
    2. 4.2. What is H.264?
      1. 4.2.1. A video compression format
      2. 4.2.2. An industry standard
      3. 4.2.3. A toolkit for video compression
      4. 4.2.4. Better video compression
    3. 4.3. How does an H.264 codec work?
      1. 4.3.1. Encoder processes
        1. 4.3.1.1. Prediction
        2. 4.3.1.2. Transform and quantization
        3. 4.3.1.3. Bitstream encoding
      2. 4.3.2. Decoder processes
        1. 4.3.2.1. Bitstream decoding
        2. 4.3.2.2. Rescaling and inverse transform
        3. 4.3.2.3. Reconstruction
    4. 4.4. The H.264/AVC Standard
    5. 4.5. H.264 Profiles and Levels
    6. 4.6. The H.264 Syntax
    7. 4.7. H.264 in practice
      1. 4.7.1. Performance
      2. 4.7.2. Applications
    8. 4.8. Summary
    9. References
  9. 5. H.264 syntax
    1. 5.1. Introduction
      1. 5.1.1. A note about syntax examples
    2. 5.2. H.264 syntax
    3. 5.3. Frames, fields and pictures
      1. 5.3.1. Decoding order
      2. 5.3.2. Display order
      3. 5.3.3. Reference picture lists
        1. 5.3.3.1. Default reference picture list order
        2. 5.3.3.2. Changing the reference picture list order
      4. 5.3.4. Frame and field coding
        1. 5.3.4.1. Coding pictures in frame or field mode
        2. 5.3.4.2. Coding macroblocks in frame or field mode (MBAFF)
    4. 5.4. NAL unit
    5. 5.5. Parameter Sets
    6. 5.6. Slice layer
      1. 5.6.1. Slice types
      2. 5.6.2. Slice header
      3. 5.6.3. Slice data
    7. 5.7. Macroblock layer
      1. 5.7.1. Overview
      2. 5.7.2. The Intra PCM mode
      3. 5.7.3. Macroblock prediction
      4. 5.7.4. Residual data
      5. 5.7.5. Macroblock syntax examples
    8. 5.8. Summary
    9. References
  10. 6. H.264 Prediction
    1. 6.1. Introduction
    2. 6.2. Macroblock prediction
    3. 6.3. Intra prediction
      1. 6.3.1. 4 × 4 luma prediction modes
      2. 6.3.2. 16 × 16 luma prediction modes
      3. 6.3.3. Chroma prediction modes
      4. 6.3.4. 8 × 8 luma prediction, High profiles
      5. 6.3.5. Signalling intra prediction modes
        1. 6.3.5.1. 4 × 4 or 8 × 8 luma prediction
          1. 6.3.5.1.1. 16 × 16 luma prediction or chroma prediction
    4. 6.4. Inter prediction
      1. 6.4.1. Reference pictures
      2. 6.4.2. Interpolating reference pictures
        1. 6.4.2.1. Generating interpolated sub-pixel samples
          1. 6.4.2.1.1. Luma component
          2. 6.4.2.1.2. Chroma components
      3. 6.4.3. Macroblock partitions
      4. 6.4.4. Motion vector prediction
        1. 6.4.4.1. Bipredicted macroblock motion vector prediction
        2. 6.4.4.2. Direct mode motion vector prediction
      5. 6.4.5. Motion compensated prediction
        1. 6.4.5.1. One reference
        2. 6.4.5.2. Two references : biprediction
        3. 6.4.5.3. Weighted prediction
        4. 6.4.5.4. Frame / field prediction
      6. 6.4.6. Inter prediction examples
      7. 6.4.7. Prediction structures
        1. 6.4.7.1. Low delay, minimal storage
        2. 6.4.7.2. 'Classic' Group of Pictures structure
        3. 6.4.7.3. Multiple reference frames
        4. 6.4.7.4. Hierarchical prediction structures
    5. 6.5. Loop filter
      1. 6.5.1. Boundary strength
      2. 6.5.2. Filter decision
      3. 6.5.3. Filter implementation
      4. 6.5.4. Loop filter example
    6. 6.6. Summary
    7. References
  11. 7. H.264 transform and coding
    1. 7.1. Introduction
    2. 7.2. Transform and quantization
      1. 7.2.1. The H.264 transforms
      2. 7.2.2. Transform processes
        1. 7.2.2.1. Overview of transform processes
        2. 7.2.2.2. Luma transform processes
        3. 7.2.2.3. Chroma transform processes
      3. 7.2.3. Integer transform and quantization : 4 × 4 blocks
        1. 7.2.3.1. Developing the forward transform and quantization process
        2. 7.2.3.2. Developing the rescaling and inverse transform process
        3. 7.2.3.3. Developing Cf4 and Sf4 : 4 × 4 blocks
        4. 7.2.3.4. Developing Ci4 and Si4 : 4 × 4 blocks
        5. 7.2.3.5. Developing Vi4
        6. 7.2.3.6. The complete 4 × 4 inverse transform and scaling process
        7. 7.2.3.7. Deriving Mf4
        8. 7.2.3.8. The complete 4 × 4 forward transform and scaling process
        9. 7.2.3.9. 4 × 4 Transform and quantization: Examples
          1. 7.2.3.9.1. Core transform
          2. 7.2.3.9.2. Scaling and quantization, QP = 6
          3. 7.2.3.9.3. Scaling and quantization, QP = 12
          4. 7.2.3.9.4. Scaling and quantization, QP = 18
          5. 7.2.3.9.5. Scaling and quantization, QP = 30
          6. 7.2.3.9.6. Comparison with 4 × 4 DCT using floating-point arithmetic
      4. 7.2.4. Integer transform and quantization : 8 × 8 blocks
        1. 7.2.4.1. Forward transform Cf8 : 8 × 8 blocks
        2. 7.2.4.2. Inverse transform Ci8: 8 × 8 blocks
        3. 7.2.4.3. Inverse quantization and scaling: 8 × 8 blocks
        4. 7.2.4.4. Forward quantization and scaling : 8 × 8 blocks
      5. 7.2.5. DC transforms
      6. 7.2.6. Transform and quantization extensions in the High profiles
        1. 7.2.6.1.
          1. 7.2.6.1.1. Frequency dependent quantization, High profiles
          2. 7.2.6.1.2. Lossless predictive coding, High 4:4:4 profiles
          3. 7.2.6.1.3. Colour plane coding, High 4:4:4 profiles
    3. 7.3. Block scan orders
    4. 7.4. Coding
      1. 7.4.1. Exp-Golomb Coding
      2. 7.4.2. Context Adaptive Variable Length Coding, CAVLC
      3. 7.4.3. Context Adaptive Binary Arithmetic Coding, CABAC
        1. 7.4.3.1. The coding process
        2. 7.4.3.2. The context models
        3. 7.4.3.3. The arithmetic coding engine
    5. 7.5. Summary
    6. References
  12. 8. H.264 conformance, transport and licensing
    1. 8.1. Introduction
    2. 8.2. Conforming to the Standard
      1. 8.2.1. Profiles
        1. 8.2.1.1. Baseline, Constrained Baseline, Extended and Main Profiles
        2. 8.2.1.2. High Profiles
        3. 8.2.1.3. Intra Profiles
      2. 8.2.2. Levels
      3. 8.2.3. Hypothetical Reference Decoder
      4. 8.2.4. Conformance testing
        1. 8.2.4.1. Testing a bitstream
        2. 8.2.4.2. Testing a decoder
        3. 8.2.4.3. Testing an encoder
    3. 8.3. H.264 coding tools for transport support
      1. 8.3.1. Redundant slices
      2. 8.3.2. Arbitrary Slice Order (ASO)
      3. 8.3.3. Slice Groups / Flexible Macroblock Order (FMO)
      4. 8.3.4. SP and SI slices
      5. 8.3.5. Data partitioned slices
    4. 8.4. Transport of H.264 data
      1. 8.4.1. Encapsulation in RBSPs, NALUs and packets
      2. 8.4.2. Transport protocols
      3. 8.4.3. File formats
      4. 8.4.4. Coding and transport issues
    5. 8.5. Supplemental Information
      1. 8.5.1. Supplemental Enhancement Information (SEI)
      2. 8.5.2. Video Usability Information (VUI)
    6. 8.6. Licensing H.264/AVC
      1. 8.6.1. Video coding patents
      2. 8.6.2. Video coding standards and patents
      3. 8.6.3. Licensing H.264/AVC patents
    7. 8.7. Summary
    8. References
  13. 9. H.264 performance
    1. 9.1. Introduction
    2. 9.2. Experimenting with H.264
      1. 9.2.1. The JM Reference Software
        1. 9.2.1.1. Overview
        2. 9.2.1.2. File formats
        3. 9.2.1.3. Basic operation
        4. 9.2.1.4. Advanced operation
        5. 9.2.1.5. Trace file
      2. 9.2.2. Other software encoders/decoders
      3. 9.2.3. H.264 stream analysis
    3. 9.3. Performance comparisons
      1. 9.3.1. Performance criteria
      2. 9.3.2. Performance examples: Foreman sequence, QCIF resolution
        1. 9.3.2.1. 'Low complexity' and 'Basic'
        2. 9.3.2.2. 'Basic' configuration plus options
        3. 9.3.2.3. Baseline and Main Profile
      3. 9.3.3. Performance examples: Foreman and Container sequences
      4. 9.3.4. Performance examples: Inter prediction structures
      5. 9.3.5. Performance example: H.264 vs. MPEG-4 Visual
    4. 9.4. Rate control
      1. 9.4.1. Rate control in the JM reference encoder
        1. 9.4.1.1. GOP level rate control
        2. 9.4.1.2. Frame and/or basic unit rate control
    5. 9.5. Mode selection
      1. 9.5.1. Rate Distortion Optimized mode selection
    6. 9.6. Low complexity coding
      1. 9.6.1. Approximating the cost function
      2. 9.6.2. Reducing the set of tested modes
      3. 9.6.3. Early termination
    7. 9.7. Summary
    8. References
  14. 10. Extensions and directions
    1. 10.1. Introduction
    2. 10.2. Scalable Video Coding
      1. 10.2.1. Simulcast transmission
      2. 10.2.2. Scalable transmission
      3. 10.2.3. Applications of Scalable Video Coding
      4. 10.2.4. Scalable video coding in H.264
      5. 10.2.5. Temporal scalability
      6. 10.2.6. Quality scalability: overview
      7. 10.2.7. Spatial scalability: overview
      8. 10.2.8. Spatial scalability in detail
      9. 10.2.9. Quality scalability in detail
      10. 10.2.10. Combined scalability
      11. 10.2.11. SVC performance
    3. 10.3. Multiview Video Coding
      1. 10.3.1. H.264 Multiview Video Coding
    4. 10.4. Configurable Video Coding
      1. 10.4.1. MPEG Reconfigurable Video Coding
      2. 10.4.2. Fully Configurable Video Coding
    5. 10.5. Beyond H.264/AVC
    6. 10.6. Summary
    7. References