CHAPTER 18

image

HUMAN SPEECH RECOGNITION

18.1 INTRODUCTION

How do people recognize and understand speech? As with other aspects of perception that we have touched on, this is a focus for many books and articles. Our task is further complicated by the fact that, despite the profusion of articles on the subject, very little is understood in this area; at least there is very little that experts agree on.

Here we can only hope to introduce a few key concepts and in particular to lay the groundwork for the reader to think about aspects of human recognition that are different from the common approaches to artificial speech recognizers. For this purpose, we focus on two particular studies: the perception of consonant–vowel–consonant (CVC) syllables in decades-long studies, directed by Harvey Fletcher of Bell Labs (and later reexamined by Jont Allen [1]); and the direct comparison of human and machine “listeners” on tasks of current interest for speech-recognition research, as described by Richard Lippmann of Lincoln Labs [10].

18.2 THE ARTICULATION INDEX AND HUMAN RECOGNITION

In the 1990s, Jont Allen from AT&T revived interest in a body of work done at Bell Labs in the 1920s by a group headed by Harvey Fletcher; [1] is an insightful summary of Allen's perspective on this work. Here we describe only a few key points from that paper.

18.2.1 The Big Idea

A principal proposal of this paper is ...

Get Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.