CHAPTER 27

image

DISCRIMINANT ACOUSTIC PROBABILITY ESTIMATION

27.1 INTRODUCTION

In the previous chapters we introduced the notion of trainable statistical models for speech recognition, in particular focusing on the set of methods and constraints associated with hidden Markov models (HMMs). In both training and recognition phases, the key values that must be estimated from the acoustics are the emission probabilities, also referred to as the acoustic likelihoods. These values are used to derive likelihoods for each model of a complete utterance, in combination with statistical information about the a priori probability of word sequences. In other words, the probabilities that the local acoustic measurements were generated by each hypothesized state are ultimately integrated into a global probability that a complete utterance is generated by a complete HMM (either by considering all possible state sequences associated with a model, or by considering only the most likely).

In Chapter 26 we provided examples of two common approaches to the estimation of these acoustic probabilities: codebook tables associated with vector quantized features, giving probabilities for each feature value conditioned on the state; and Gaussians or mixtures of Gaussians associated with one or more states. For both of these examples, EM training is used to maximize the likelihood of the acoustic feature sequence's ...

Get Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.