You are previewing Machine Audition: Principles, Algorithms and Systems.
O'Reilly logo
Machine Audition: Principles, Algorithms and Systems

Book Description

Machine Audition: Principles, Algorithms and Systems contains advances in algorithmic developments, theoretical frameworks, and experimental research findings. This book is useful for professionals who want an improved understanding about how to design algorithms for performing automatic analysis of audio signals, construct a computing system for understanding sound, and learn how to build advanced human-computer interactive systems.

Table of Contents

  1. Cover
  2. Title Page
  3. Copyright Page
  4. Editorial Advisory Board and List of Reviewers
    1. EDITORIAL ADVISORY BOARD
    2. LIST OF REVIEWERS
  5. Preface
    1. OUTLINE AND SUBJECT OF THIS BOOK
    2. OBJECTIVES, MISSIONS AND THE SCHOLARLY VALUE
    3. TARGET AUDIENCE
    4. ORGANIZATION OF THE BOOK
  6. Acknowledgment
  7. Chapter 1: Unstructured Environmental Audio
    1. ABSTRACT
    2. INTRODUCTION
    3. BACKGROUND
    4. ONLINE AUDIO BACKGROUND MODELING AND EVENT DETECTION USING SEMI-SUPERVISED LEARNING
    5. CONCLUSION
  8. Chapter 2: Modeling Grouping Cues for Auditory Scene Analysis Using a Spectral Clustering Formulation
    1. ABSTRACT
    2. INTRODUCTION
    3. RELATED WORK
    4. SPECTRAL CLUSTERING IN AUDIO ANALYSIS
    5. EXPERIMENTAL VALIDATION AND APPLICATIONS
    6. SOFTWARE IMPLEMENTATION
    7. SUMMARY
    8. ACKNOWLEDGMENT
  9. Chapter 3: Cocktail Party Problem
    1. ABSTRACT
    2. INTRODUCTION
    3. BACKGROUND FOR AUDIO SOURCES
    4. COMPUTATIONAL AUDITORY SCENE ANALYSIS
    5. BLIND SOURCE SEPARATION
    6. MODEL BASED APPROACHES
    7. NON NEGATIVE MATRIX/TENSOR FACTORIZATION
    8. SPARSE REPRESENTATION AND COMPRESSED SENSING
    9. A MULTISTAGE APPROACH
    10. RELATIONS TO OTHER METHODS
    11. APPLICATION AREAS
    12. CONCLUSION AND FUTURE RESEARCH
  10. Chapter 4: Audition
    1. ABSTRACT
    2. INTRODUCTION
    3. REAL WORLD DEMANDS
    4. THE FUNCTIONAL ROLE OF THE AUDITORY SYSTEM
    5. ESTIMATING TASK-RELEVANCE
    6. VISUAL AND AUDITORY GIST
    7. THE PHYSICAL CHARACTER OF AUDITORY KNOWLEDGE
    8. CONCLUSION
    9. THE FUTURE OF MACHINE AUDITION
  11. Chapter 5: A Multimodal Solution to Blind Source Separation of Moving Sources
    1. ABSTRACT
    2. INTRODUCTION
    3. THE SYSTEM MODEL
    4. SOURCE SEPARATION
    5. EXPERIMENTS AND RESULTS
    6. CONCLUSION
  12. Chapter 6: Sound Source Localization
    1. ABSTRACT
    2. INTRODUCTION
    3. PROBLEM DEFINITION USING A SIMPLE MICROPHONE ARRAY
    4. CONVENTIONAL SOURCE LOCALIZATION METHODS
    5. FACTORS AFFECTING AUTOMATIC LOCALIZATION
    6. INTENSITY MEASUREMENT FOR SOURCE LOCALIZATION
    7. ANALYSIS OF INTENSITY VECTOR DIRECTIONS FOR SOURCE LOCALIZATION
    8. FUTURE RESEARCH DIRECTIONS
    9. CONCLUSION
  13. Chapter 7: Probabilistic Modeling Paradigms for Audio Source Separation
    1. ABSTRACT
    2. INTRODUCTION
    3. SOURCE SEPARATION VIA LINEAR MODELING
    4. SOURCE SEPARATION VIA VARIANCE MODELING
    5. OBJECTIVE PERFORMANCE EVALUATION
    6. DISCUSSION AND FUTURE RESEARCH DIRECTIONS
    7. CONCLUSION
    8. APPENDIX: STANDARD PARAMETRIC PROBABILITY DISTRIBUTIONS
  14. Chapter 8: Tensor Factorization with Application to Convolutive Blind Source Separation of Speech
    1. ABSTRACT
    2. INTRODUCTION
    3. CONVOLUTIVE BLIND SOURCE SEPARATION
    4. PARALLEL FACTOR ANALYSIS
    5. APPLICATION OF PARAFAC TO CONVOLUTIVE MIXTURES
    6. EXPERIMENTAL RESULTS
    7. DISCUSSION AND CONCLUSION
  15. Chapter 9: Multi-Channel Source Separation
    1. ABSTRACT
    2. INTRODUCTION
    3. SEPARATION TAXONOMY
    4. SIGNAL MODEL
    5. FREQUENCY DOMAIN REPRESENTATION
    6. MASK-BASED, NON-LINEAR APPROACHES
    7. SOFT MASKS
    8. LINEAR APPROACHES
    9. EXPERIMENTAL EVALUATION
    10. POST-PROCESSING
    11. CONCLUSION
  16. Chapter 10: Audio Source Separation Using Sparse Representations
    1. ABSTRACT
    2. INTRODUCTION
    3. SOURCE SEPARATION
    4. SPARSE COMPONENT ANALYSIS BASED ON LAPPED ORTHOGONAL TRANSFORMS
    5. SPARSE COMPONENT ANALYSIS BASED ON A LEARNED DICTIONARY
    6. CONCLUSION
  17. Chapter 11: Itakura-Saito Nonnegative Factorizations of the Power Spectrogram for Music Signal Decomposition
    1. ABSTRACT
    2. INTRODUCTION
    3. NMF WITH THE ITAKURA-SAITO DIVERGENCE
    4. BAYESIAN EXTENSIONS TO ITAKURA-SAITO NMF
    5. MULTICHANNEL IS-NMF
    6. CONCLUSION
    7. Appendix A: Standard Distributions
  18. Chapter 12: Music Onset Detection
    1. ABSTRACT
    2. INTRODUCTION
    3. ALGORITHMS
    4. PERFORMANCE EVALUATION
    5. FUTURE RESEARCH DIRECTIONS
  19. Chapter 13: On the Inherent Segment Length in Music
    1. ABSTRACT
    2. INTRODUCTION
    3. FEATURE ESTIMATION
    4. SEGMENTATION
    5. BEST SEGMENT SIZE
    6. ACTUAL INHERENT SEGMENT BOUNDARIES
    7. CONCLUSION
  20. Chapter 14: Automatic Tagging of Audio
    1. ABSTRACT
    2. INTRODUCTION
    3. FRAMEWORK
    4. AUDIO REPRESENTATION
    5. OBTAINING LABELED DATA
    6. MACHINE LEARNING METHODS
    7. EVALUATION
    8. THREE PUBLISHED IMPLEMENTATIONS
    9. FUTURE RESEARCH DIRECTIONS
    10. CONCLUSION
  21. Chapter 15: Instantaneous vs. Convolutive Non-Negative Matrix Factorization
    1. ABSTRACT
    2. INTRODUCTION
    3. INSTANTANEOUS NMF
    4. CONVOLUTIVE NMF
    5. CONVERGENCE ANALYSIS OF THE CONVOLUTIVE NMF ALGORITHM
    6. APPLICATIONS TO AUDIO PATTERN SEPARATION
    7. APPLICATIONS TO MUSIC ONSET DETECTION
    8. EVALUATIONS OF THE CONVOLUTIVE NMF ALGORITHMS
    9. FUTURE RESEARCH DIRECTIONS
    10. APPENDIX A
  22. Chapter 16: Musical Information Dynamics as Models of Auditory Anticipation
    1. ABSTRACT
    2. GENERAL INTRODUCTION
    3. STRUCTURE OF THIS CHAPTER
    4. EXPECTANCY IN MUSIC
    5. MUSIC AS AN INFORMATION SOURCE
    6. SURFACE VERSUS STRUCTURE: LONG AND SHORT TERM INFORMATION DYNAMICS
    7. INFORMATION DYNAMICS IN AUDIO SIGNALS
    8. INFORMATION GAP
    9. RELATIONS BETWEEN DIFFERENT INFORMATION DYNAMICS MEASURES
    10. CONCLUSION: APPLICATIONS OF INFORMATION DYNAMICS
    11. Appendix A
  23. Chapter 17: Multimodal Emotion Recognition
    1. ABSTRACT
    2. INTRODUCTION
    3. METHODOLOGY
    4. FUSION TECHNIQUES
    5. FUTURE RESEARCH DIRECTIONS
    6. CONCLUSION
  24. Chapter 18: Machine Audition of Acoustics
    1. ABSTRACT
    2. INTRODUCTION
    3. ACOUSTIC TRANSMISSION CHANNELS AND ACOUSTIC PARAMETERS
    4. EXTRACTION OF REVERBERATION TIME FROM DISCRETE UTTERANCES
    5. ESTIMATION OF SPEECH TRANSMISSION INDEX FROM RUNNING SPEECH
    6. ESTIMATION OF REVERBERATION TIME FROM RUNNING SPEECH
    7. BLIND ESTIMATION USING EIGENVALUES AS A FEATURE SPACE
    8. USING MUSIC AS STIMULI
    9. BLIND ESTIMATION WITH MAXIMUM LIKELIHOOD ESTIMATION
    10. CONCLUDING REMARKS
  25. Chapter 19: Neuromorphic Speech Processing
    1. ABSTRACT
    2. INTRODUCTION
    3. BACKGROUND
    4. NEUROMORPHIC SPEECH PROCESSING
    5. SOME SELECTED RESULTS
    6. FUTURE RESEARCH DIRECTIONS
    7. CONCLUSION
  26. Compilation of References
  27. About the Contributors