You are previewing Voice User Interface Design.
O'Reilly logo
Voice User Interface Design

Book Description

This book is a comprehensive and authoritative guide to voice user interface (VUI) design. The VUI is perhaps the most critical factor in the success of any automated speech recognition (ASR) system, determining whether the user experience will be satisfying or frustrating, or even whether the customer will remain one. This book describes a practical methodology for creating an effective VUI design. The methodology is scientifically based on principles in linguistics, psychology, and language technology, and is illustrated here by examples drawn from the authors' work at Nuance Communications, the market leader in ASR development and deployment.

The book begins with an overview of VUI design issues and a description of the technology. The authors then introduce the major phases of their methodology. They first show how to specify requirements and make high-level design decisions during the definition phase. They next cover, in great detail, the design phase, with clear explanations and demonstrations of each design principle and its real-world applications. Finally, they examine problems unique to VUI design in system development, testing, and tuning. Key principles are illustrated with a running sample application.

A companion Web site provides audio clips for each example: www.VUIDesign.org

The cover photograph depicts the first ASR system, Radio Rex: a toy dog who sits in his house until the sound of his name calls him out. Produced in 1911, Rex was among the few commercial successes in earlier days of speech recognition. Voice User Interface Design reveals the design principles and practices that produce commercial success in an era when effective ASRs are not toys but competitive necessities.



Table of Contents

  1. Copyright
    1. Dedication
  2. Praise for Voice User Interface Design
  3. About the Authors and Radio Rex
  4. Preface
    1. Organization of The Book
    2. Audience
    3. Web Site
    4. Acknowledgments
  5. I. Introduction
    1. 1. Introduction to Voice User Interfaces
      1. 1.1. What Is a Voice User Interface?
        1. 1.1.1. Auditory Interfaces
        2. 1.1.2. Spoken Language Interfaces
      2. 1.2. Why Speech?
      3. 1.3. Where Do We Go from Here?
    2. 2. Overview of Spoken Language Technology
      1. 2.1. Architecture of a Spoken Language System
        1. 2.1.1. Elements of a Spoken Language System
        2. 2.1.2. Recognition
          1. Acoustic Models
          2. Dictionary
          3. Grammar
          4. Recognition Search
          5. Confidence Measures
          6. N-Best Processing
          7. Barge-in
        3. 2.1.3. Other Speech Technologies
          1. Text-to-Speech Synthesis
          2. Speaker Verification
      2. 2.2. The Impact of Speech Technology on Design Decisions
        1. 2.2.1. Performance Challenges
          1. Ambiguity
          2. Limited Acoustic Information
          3. Noise
        2. 2.2.2. Problem Solving
        3. 2.2.3. Definition Files
          1. Grammar
          2. Dictionary
          3. Acoustic Models
          4. Configuration Files
      3. 2.3. Conclusion
    3. 3. Overview of the Methodology
      1. 3.1. Methodological Principles
        1. 3.1.1. User Input
        2. 3.1.2. Integrated Business and User Needs
        3. 3.1.3. Thorough Early Work
        4. 3.1.4. Conversational Design
        5. 3.1.5. Context
      2. 3.2. Steps of the Methodology
        1. 3.2.1. Requirements Definition
        2. 3.2.2. High-Level Design
        3. 3.2.3. Detailed Design
        4. 3.2.4. Development
        5. 3.2.5. Testing
        6. 3.2.6. Tuning
      3. 3.3. Applying the Methodology to Real-World Applications
        1. 3.3.1. Coordination of Phases
        2. 3.3.2. Dealing with Real-World Budget and Time Constraints
      4. 3.4. Conclusion
  6. II. Definition Phase: Requirements Gathering and High-Level Design
    1. 4. Requirements and High-Level Design Methodology
      1. 4.1. Requirements Definition
        1. 4.1.1. Understanding the Business
          1. Evaluating Other Company Systems and Customer Touchpoints
          2. Meeting with Company Personnel
          3. Evaluating Competitive Systems
        2. 4.1.2. Understanding the Users
          1. Meeting with Company Personnel
          2. Observational Studies
          3. Interviewing Customer Service Representatives
          4. Focus Groups
          5. Individual Interviews
          6. Surveys
        3. 4.1.3. Understanding the Application
          1. Evaluating Other Company Systems
          2. Meeting with Company Personnel
      2. 4.2. High-Level Design
        1. 4.2.1. Key Design Criteria
        2. 4.2.2. Dialog Strategy and Grammar Type
        3. 4.2.3. Pervasive Dialog Elements
        4. 4.2.4. Recurring Terminology
        5. 4.2.5. Metaphor
        6. 4.2.6. Persona
        7. 4.2.7. Nonverbal Audio
          1. Look and Feel
          2. Usability
          3. Communication
          4. General Considerations
      3. 4.3. Conclusion
    2. 5. High-Level Design Elements
      1. 5.1. Dialog Strategy and Grammar Type
      2. 5.2. Pervasive Dialog Elements
        1. 5.2.1. Error Recovery Strategies
          1. Escalating Detail
          2. Rapid Reprompt
          3. Variation on Rapid Reprompt
          4. No-Speech Timeouts
          5. State-Specific and Global Error Counts
        2. 5.2.2. Universals
        3. 5.2.3. Login
      3. 5.3. Conclusion
    3. 6. Creating Persona, by Design
      1. 6.1. What Is Persona?
      2. 6.2. Where Does Persona Come From?
      3. 6.3. A Checklist for Persona Design
        1. 6.3.1. Metaphor and Role
        2. 6.3.2. Brand and Image
        3. 6.3.3. End Users
          1. Target Audience
          2. Frequency of System Use
          3. Mind-Set of the User
        4. 6.3.4. Application
          1. Content
          2. Task-Related Issues
      4. 6.4. Persona Definition
      5. 6.5. Conclusion
    4. 7. Sample Application: Requirements and High-Level Design
      1. 7.1. Lexington Brokerage
      2. 7.2. Requirements Definition
        1. 7.2.1. Understanding the Business Goals and Context
        2. 7.2.2. Understanding the Caller
        3. 7.2.3. Understanding the Application
      3. 7.3. High-Level Design
        1. 7.3.1. Key Design Criteria
        2. 7.3.2. Dialog Strategy and Grammar Type
        3. 7.3.3. Pervasive Dialog Elements
        4. 7.3.4. Recurring Terminology
        5. 7.3.5. Metaphor
        6. 7.3.6. Persona
          1. Metaphor and Role
          2. Brand and Image
          3. End Users
          4. Application
        7. 7.3.7. Nonverbal Audio
      4. 7.4. Conclusion
  7. III. Design Phase: Detailed Design
    1. 8. Detailed Design Methodology
      1. 8.1. Anatomy of a Dialog State
      2. 8.2. Call Flow Design
      3. 8.3. Prompt Design
        1. 8.3.1. Conversational Design
        2. 8.3.2. Auditory Design
      4. 8.4. User Testing
        1. 8.4.1. Formal Usability Testing
          1. Basic Approaches
          2. Task Design and Measurements
          3. Selecting and Recruiting Participants
          4. Running the Test
          5. Analysis of Data
        2. 8.4.2. Card Sorting
      5. 8.5. Design Principles
      6. 8.6. Conclusion
    2. 9. Minimizing Cognitive Load
      1. 9.1. Conceptual Complexity
        1. 9.1.1. Constancy
        2. 9.1.2. Consistency
        3. 9.1.3. Context Setting
      2. 9.2. Memory Load
        1. 9.2.1. Menu Size
        2. 9.2.2. Recency
        3. 9.2.3. Instruction
          1. Tutorials
          2. Just-in-Time Instruction
      3. 9.3. Attention
      4. 9.4. Conclusion
    3. 10. Designing Prompts
      1. 10.1. Conversation as Discourse
      2. 10.2. Cohesion
        1. 10.2.1. Pronouns and Time Adverbs
        2. 10.2.2. Discourse Markers
          1. Now
          2. By the Way
          3. Oh
          4. Actually
          5. Otherwise
          6. Okay
          7. Sorry
      3. 10.3. Information Structure
      4. 10.4. Spoken Versus Written English
        1. 10.4.1. Pointer Words
        2. 10.4.2. Contraction
        3. 10.4.3. Must and May
        4. 10.4.4. Will Versus Going To
        5. 10.4.5. “Romans Perspire, Anglo-Saxons Sweat”
      5. 10.5. Register and Consistency
      6. 10.6. Jargon
      7. 10.7. The Cooperative Principle
      8. 10.8. Conclusion
    4. 11. Planning Prosody
      1. 11.1. What Is Prosody?
      2. 11.2. Functions of Prosody
      3. 11.3. Stress
      4. 11.4. Intonation
        1. 11.4.1. Basic Intonation Contours
          1. Rising-Falling, Final
          2. Rising
          3. Rising-Falling, Nonfinal
        2. 11.4.2. Contours in Context
          1. Lists
          2. Yes/No Questions
          3. Wh- Questions
          4. Either/Or Questions
      5. 11.5. Concatenating Phone Numbers
        1. 11.5.1. The Prosodic Structure of Phone Numbers
        2. 11.5.2. Concatenation Digit-by-Digit
        3. 11.5.3. Concatenation by Groups
      6. 11.6. Minimizing Concatenation Splices
      7. 11.7. Pauses
        1. Concatenation Plan
      8. 11.8. TTS Guidelines
        1. 11.8.1. Analyze Application Usage
        2. 11.8.2. Choose an Appropriate Voice
        3. 11.8.3. When Possible, Use Audio Recordings
        4. 11.8.4. Make Content Easy to Understand
        5. 11.8.5. Use Appropriate Formats
        6. 11.8.6. Mark Up Text for Naturalness
      9. 11.9. Conclusion
    5. 12. Maximizing Efficiency and Clarity
      1. 12.1. Efficiency
        1. 12.1.1. Don't Lose Work
        2. 12.1.2. Make Frequent Tasks Efficient
        3. 12.1.3. Provide Shortcuts
        4. 12.1.4. Use Caller Modeling to Save Steps
      2. 12.2. Clarity
        1. 12.2.1. Mental Models for Natural Language Understanding
        2. 12.2.2. Navigational Clarity Through Landmarking
      3. 12.3. Balancing Efficiency and Clarity
        1. 12.3.1. Stress Clarity in Individual Prompts
        2. 12.3.2. Taper Prompts
        3. 12.3.3. Use Barge-In
      4. 12.4. Conclusion
    6. 13. Optimizing Accuracy and Recovering from Errors
      1. 13.1. Measuring Accuracy
      2. 13.2. Dialog Design Guidelines for Maximizing Accuracy
      3. 13.3. Recovering from Errors
        1. 13.3.1. Confirmation Strategies
          1. When to Confirm
          2. How to Confirm
          3. Avoiding Repeat Errors
        2. 13.3.2. Recovering from Rejects and Timeouts
      4. 13.4. Conclusion
    7. 14. Sample Application: Detailed Design
      1. 14.1. Call Flow Design
        1. 14.1.1. The Login Subdialog
        2. 14.1.2. The Quotes Subdialog
        3. 14.1.3. The Trading Subdialog
      2. 14.2. Prompt Design
      3. 14.3. User Testing
      4. 14.4. Conclusion
  8. IV. Realization Phase: Development, Testing, and Tuning
    1. 15. Development, Testing, and Tuning Methodology
      1. 15.1. Development
        1. 15.1.1. Application Development
        2. 15.1.2. Grammar Development
        3. 15.1.3. Audio Production
      2. 15.2. Testing
        1. 15.2.1. Application Testing
          1. Dialog Traversal Test
          2. System QA Test
          3. Load Test
        2. 15.2.2. Recognition Testing
        3. 15.2.3. Evaluative Usability Testing
      3. 15.3. Tuning
        1. 15.3.1. Dialog Tuning
          1. Call Monitoring
          2. Call Log Analysis
          3. User Experience Research
        2. 15.3.2. Recognition Tuning
          1. Dictionary Tuning
          2. Grammar Probabilities
      4. 15.4. Conclusion
    2. 16. Creating Grammars
      1. 16.1. Grammar Development
        1. 16.1.1. Developing Rule-Based Grammars
        2. 16.1.2. Developing Grammars for Statistical Language Models
        3. 16.1.3. Developing Robust Natural Language Grammars
        4. 16.1.4. Developing Statistical Natural Language Grammars
      2. 16.2. Grammar Testing
        1. 16.2.1. Testing Rule-Based Grammars
          1. Coverage Testing
          2. Overcoverage Testing
          3. Natural Language Testing
          4. Ambiguity Testing
          5. Spelling Testing
          6. Pronunciation Testing
        2. 16.2.2. Testing Statistical Language Models
        3. 16.2.3. Testing Robust Natural Language Grammars
        4. 16.2.4. Testing Statistical Natural Language Grammars
      3. 16.3. Grammar Tuning
        1. 16.3.1. Tuning Rule-Based Grammars
        2. 16.3.2. Tuning Statistical Language Models
        3. 16.3.3. Tuning Robust Natural Language Grammars
        4. 16.3.4. Tuning Statistical Natural Language Grammars
      4. 16.4. Conclusion
    3. 17. Working with Voice Actors
      1. 17.1. Scripting for Success
        1. 17.1.1. An Introductory Case Study
        2. 17.1.2. Scripting Tips
          1. Create Useful Direction Notes
          2. Group Related Items and Contextualize
          3. Indicate Contrastive Stress
          4. Use Punctuation Wisely
          5. Follow Practical Guidelines
      2. 17.2. Choosing Your Voice Actor
        1. 17.2.1. Professionalism and Experience
        2. 17.2.2. Coachability
        3. 17.2.3. Fit with Persona
        4. 17.2.4. Demo Tapes (or CDs) and Auditions
      3. 17.3. Running a Recording Session
        1. 17.3.1. Procedural Considerations
          1. Prepare the Voice Actor
          2. Recording Procedure
          3. Planning Tips
        2. 17.3.2. Voice Coaching
      4. 17.4. Conclusion
    4. 18. Sample Application: Development, Testing, and Tuning
      1. 18.1. Development
        1. 18.1.1. Application Development
        2. 18.1.2. Grammar Development
        3. 18.1.3. Audio Production
      2. 18.2. Testing
        1. 18.2.1. Evaluative Usability Testing
      3. 18.3. Tuning
        1. 18.3.1. Dialog Tuning
        2. 18.3.2. Recognition Tuning
        3. 18.3.3. Grammar Tuning
        4. 18.3.4. User Survey
    5. 19. Conclusion
    6. APPENDIX
  9. Bibliography
    1. Works Cited
    2. Works Consulted