The incorporation of linguistic constraints into speech recognition will be the major focus of this chapter.1 Thus far, we have described ASR as a pattern-recognition problem, requiring signal representations, distance or probability estimators, and temporal integration. We have largely ignored linguistic structure, except where it was required to describe the classification units (Chapter 23). However, the Bayes rule formulation of ASR requires an estimate of the prior probability of a hypothesized sequence of words. Since we often do not have enough examples of any given complete utterance to estimate its likelihood accurately, we must be concerned with strategies for training word-sequence probability estimators with insufficient data.2 Finally, it is also necessary to represent the pronunciation of words as a succession of smaller linguistic units such as phones.
We show how these aspects are incorporated in the decoding process for recognition. We also discuss a number of aspects of complete system integration, including one example of a speech-understanding system, that is, a system that includes a functional interpretation of the recognized word sequences (for a limited task).
For the purposes of this chapter, we establish some simple definitions: