You are previewing Techniques for Noise Robustness in Automatic Speech Recognition.
O'Reilly logo
Techniques for Noise Robustness in Automatic Speech Recognition

Book Description

Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences.

Key features:

  • Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech.

  • Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments.

  • Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR.

  • Includes contributions from top ASR researchers from leading research units in the field

Table of Contents

  1. Cover
  2. Title Page
  3. Copyright
  4. List of Contributors
  5. Acknowledgments
  6. Chapter 1: Introduction
    1. 1.1 Scope of the Book
    2. 1.2 Outline
    3. 1.3 Notation
  7. Part One: Foundations
    1. Chapter 2: The Basics of Automatic Speech Recognition
      1. 2.1 Introduction
      2. 2.2 Speech Recognition Viewed as Bayes Classification
      3. 2.3 Hidden Markov Models
      4. 2.4 HMM-Based Speech Recognition
    2. Chapter 3: The Problem of Robustness in Automatic Speech Recognition
      1. 3.1 Errors in Bayes Classification
      2. 3.2 Bayes Classification and ASR
      3. 3.3 External Influences on Speech Recordings
      4. 3.4 The Effect of External Influences on Recognition
      5. 3.5 Improving Recognition under Adverse Conditions
  8. Part Two: Signal Enhancement
    1. Chapter 4: Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement
      1. 4.1 Introduction
      2. 4.2 Signal Analysis and Synthesis
      3. 4.3 Voice Activity Detection
      4. 4.4 Noise Power Spectrum Estimation
      5. 4.5 Adaptive Filters for Signal Enhancement
      6. 4.6 ASR Performance
      7. 4.7 Conclusions
    2. Chapter 5: Extraction of Speech from Mixture Signals
      1. 5.1 The Problem with Mixtures
      2. 5.2 Multichannel Mixtures
      3. 5.3 Single-Channel Mixtures
      4. 5.4 Variations and Extensions
      5. 5.5 Conclusions
    3. Chapter 6: Microphone Arrays
      1. 6.1 Speaker Tracking
      2. 6.2 Conventional Microphone Arrays
      3. 6.3 Conventional Adaptive Beamforming Algorithms
      4. 6.4 Spherical Microphone Arrays
      5. 6.5 Spherical Adaptive Algorithms
      6. 6.6 Comparative Studies
      7. 6.7 Comparison of Linear and Spherical Arrays for DSR
      8. 6.8 Conclusions and Further Reading
  9. Part Three: Feature Enhancement
    1. Chapter 7: From Signals to Speech Features by Digital Signal Processing
      1. 7.1 Introduction
      2. 7.2 The Speech Signal
      3. 7.3 Spectral Processing
      4. 7.4 Cepstral Processing
      5. 7.5 Influence of Distortions on Different Speech Features
      6. 7.6 Summary and Further Reading
    2. Chapter 8: Features Based on Auditory Physiology and Perception
      1. 8.1 Introduction
      2. 8.2 Some Attributes of Auditory Physiology and Perception
      3. 8.3 “Classic” Auditory Representations
      4. 8.4 Current Trends in Auditory Feature Analysis
      5. 8.5 Summary
      6. Acknowledgments
    3. Chapter 9: Feature Compensation
      1. 9.1 Life in an Ideal World
      2. 9.2 MMSE-SPLICE
      3. 9.3 Discriminative SPLICE
      4. 9.4 Model-Based Feature Enhancement
      5. 9.5 Switching Linear Dynamic System
      6. 9.6 Conclusion
    4. Chapter 10: Reverberant Speech Recognition
      1. 10.1 Introduction
      2. 10.2 The Effect of Reverberation
      3. 10.3 Approaches to Reverberant Speech Recognition
      4. 10.4 Feature Domain Model of the Acoustic Impulse Response
      5. 10.5 Bayesian Feature Enhancement
      6. 10.6 Experimental Results
      7. 10.7 Conclusions
      8. Acknowledgment
  10. Part Four: Model Enhancement
    1. Chapter 11: Adaptation and Discriminative Training of Acoustic Models
      1. 11.1 Introduction
      2. 11.2 Acoustic Model Adaptation and Noise Robustness
      3. 11.3 Maximum A Posteriori Reestimation
      4. 11.4 Maximum Likelihood Linear Regression
      5. 11.5 Discriminative Training
      6. 11.6 Conclusion
    2. Chapter 12: Factorial Models for Noise Robust Speech Recognition
      1. 12.1 Introduction
      2. 12.2 The Model-Based Approach
      3. 12.3 Signal Feature Domains
      4. 12.4 Interaction Models
      5. 12.5 Inference Methods
      6. 12.6 Efficient Likelihood Evaluation in Factorial Models
      7. 12.7 Current Directions
    3. Chapter 13: Acoustic Model Training for Robust Speech Recognition
      1. 13.1 Introduction
      2. 13.2 Traditional Training Methods for Robust Speech Recognition
      3. 13.3 A Brief Overview of Speaker Adaptive Training
      4. 13.4 Feature-Space Noise Adaptive Training
      5. 13.5 Model-Space Noise Adaptive Training
      6. 13.6 Noise Adaptive Training using VTS Adaptation
      7. 13.7 Discussion
      8. 13.8 Conclusion
  11. Part Five: Compensation for Information Loss
    1. Chapter 14: Missing-Data Techniques: Recognition with Incomplete Spectrograms
      1. 14.1 Introduction
      2. 14.2 Classification with Incomplete Data
      3. 14.3 Energetic Masking
      4. 14.4 Meta-Missing Data: Dealing with Mask Uncertainty
      5. 14.5 Some Perspectives on Performance
    2. Chapter 15: Missing-Data Techniques: Feature Reconstruction
      1. 15.1 Introduction
      2. 15.2 Missing-Data Techniques
      3. 15.3 Correlation-Based Imputation
      4. 15.4 Cluster-Based Imputation
      5. 15.5 Class-Conditioned Imputation
      6. 15.6 Sparse Imputation
      7. 15.7 Other Feature-Reconstruction Methods
      8. 15.8 Experimental Results
      9. 15.9 Discussion and Conclusion
      10. Acknowledgments
    3. Chapter 16: Computational Auditory Scene Analysis and Automatic Speech Recognition
      1. 16.1 Introduction
      2. 16.2 Auditory Scene Analysis
      3. 16.3 Computational Auditory Scene Analysis
      4. 16.4 CASA Strategies
      5. 16.5 Integrating CASA with ASR
      6. 16.6 Concluding Remarks
      7. Acknowledgment
    4. Chapter 17: Uncertainty Decoding
      1. 17.1 Introduction
      2. 17.2 Observation Uncertainty
      3. 17.3 Uncertainty Decoding
      4. 17.4 Feature-Based Uncertainty Decoding
      5. 17.5 Model-Based Joint Uncertainty Decoding
      6. 17.6 Noisy CMLLR
      7. 17.7 Uncertainty and Adaptive Training
      8. 17.8 In Combination with Other Techniques
      9. 17.9 Conclusions
  12. Index