You are previewing Multilingual Natural Language Processing Applications: From Theory to Practice.
O'Reilly logo
Multilingual Natural Language Processing Applications: From Theory to Practice

Book Description

Multilingual Natural Language Processing Applications is the first comprehensive single-source guide to building robust and accurate multilingual NLP systems. Edited by two leading experts, it integrates cutting-edge advances with practical solutions drawn from extensive field experience.

Part I introduces the core concepts and theoretical foundations of modern multilingual natural language processing, presenting today’s best practices for understanding word and document structure, analyzing syntax, modeling language, recognizing entailment, and detecting redundancy.

Part II thoroughly addresses the practical considerations associated with building real-world applications, including information extraction, machine translation, information retrieval/search, summarization, question answering, distillation, processing pipelines, and more.

This book contains important new contributions from leading researchers at IBM, Google, Microsoft, Thomson Reuters, BBN, CMU, University of Edinburgh, University of Washington, University of North Texas, and others.

Coverage includes

Core NLP problems, and today’s best algorithms for attacking them

  • Processing the diverse morphologies present in the world’s languages

  • Uncovering syntactical structure, parsing semantics, using semantic role labeling, and scoring grammaticality

  • Recognizing inferences, subjectivity, and opinion polarity

  • Managing key algorithmic and design tradeoffs in real-world applications

  • Extracting information via mention detection, coreference resolution, and events

  • Building large-scale systems for machine translation, information retrieval, and summarization

  • Answering complex questions through distillation and other advanced techniques

  • Creating dialog systems that leverage advances in speech recognition, synthesis, and dialog management

  • Constructing common infrastructure for multiple multilingual text processing applications

  • This book will be invaluable for all engineers, software developers, researchers, and graduate students who want to process large quantities of text in multiple languages, in any environment: government, corporate, or academic.

    Table of Contents

    1. Title Page
    2. Copyright Page
    3. Register Your Book
      1. Contact us
    4. Dedication
    5. Contents
    6. Preface
    7. Acknowledgments
    8. About the Authors
    9. Part I. In Theory
      1. Chapter 1. Finding the Structure of Words
        1. 1.1. Words and Their Components
        2. 1.2. Issues and Challenges
        3. 1.3. Morphological Models
        4. 1.4. Summary
        5. Acknowledgment
        6. Bibliography
      2. Chapter 2. Finding the Structure of Documents
        1. 2.1. Introduction
        2. 2.2. Methods
        3. 2.3. Complexity of the Approaches
        4. 2.4. Performances of the Approaches
        5. 2.5. Features
        6. 2.6. Processing Stages
        7. 2.7. Discussion
        8. 2.8. Summary
        9. Bibliography
      3. Chapter 3. Syntax
        1. 3.1. Parsing Natural Language
        2. 3.2. Treebanks: A Data-Driven Approach to Syntax
        3. 3.3. Representation of Syntactic Structure
        4. 3.4. Parsing Algorithms
        5. 3.5. Models for Ambiguity Resolution in Parsing
        6. 3.6. Multilingual Issues: What Is a Token?
        7. 3.7. Summary
        8. Acknowledgments
        9. Bibliography
      4. Chapter 4. Semantic Parsing
        1. 4.1. Introduction
        2. 4.2. Semantic Interpretation
        3. 4.3. System Paradigms
        4. 4.4. Word Sense
        5. 4.5. Predicate-Argument Structure
        6. 4.6. Meaning Representation
        7. 4.7. Summary
        8. Bibliography
      5. Chapter 5. Language Modeling
        1. 5.1. Introduction
        2. 5.2. n-Gram Models
        3. 5.3. Language Model Evaluation
        4. 5.4. Parameter Estimation
        5. 5.5. Language Model Adaptation
        6. 5.6. Types of Language Models
        7. 5.7. Language-Specific Modeling Problems
        8. 5.8. Multilingual and Crosslingual Language Modeling
        9. 5.9. Summary
        10. Bibliography
      6. Chapter 6. Recognizing Textual Entailment
        1. 6.1. Introduction
        2. 6.2. The Recognizing Textual Entailment Task
        3. 6.3. A Framework for Recognizing Textual Entailment
        4. 6.4. Case Studies
        5. 6.5. Taking RTE Further
        6. 6.6. Useful Resources
        7. 6.7. Summary
        8. Bibliography
      7. Chapter 7. Multilingual Sentiment and Subjectivity Analysis
        1. 7.1. Introduction
        2. 7.2. Definitions
        3. 7.3. Sentiment and Subjectivity Analysis on English
        4. 7.4. Word- and Phrase-Level Annotations
        5. 7.5. Sentence-Level Annotations
        6. 7.6. Document-Level Annotations
        7. 7.7. What Works, What Doesn’t
        8. 7.8. Summary
        9. Acknowledgments
        10. Bibliography
    10. Part II. In Practice
      1. Chapter 8. Entity Detection and Tracking
        1. 8.1. Introduction
        2. 8.2. Mention Detection
        3. 8.3. Coreference Resolution
        4. 8.4. Summary
        5. Bibliography
      2. Chapter 9. Relations and Events
        1. 9.1. Introduction
        2. 9.2. Relations and Events
        3. 9.3. Types of Relations
        4. 9.4. Relation Extraction as Classification
        5. 9.5. Other Approaches to Relation Extraction
        6. 9.6. Events
        7. 9.7. Event Extraction Approaches
        8. 9.8. Moving Beyond the Sentence
        9. 9.9. Event Matching
        10. 9.10. Future Directions for Event Extraction
        11. 9.11. Summary
        12. Bibliography
      3. Chapter 10. Machine Translation
        1. 10.1. Machine Translation Today
        2. 10.2. Machine Translation Evaluation
        3. 10.3. Word Alignment
        4. 10.4. Phrase-Based Models
        5. 10.5. Tree-Based Models
        6. 10.6. Linguistic Challenges
        7. 10.7. Tools and Data Resources
        8. 10.8. Future Directions
        9. 10.9. Summary
        10. Bibliography
      4. Chapter 11. Multilingual Information Retrieval
        1. 11.1. Introduction
        2. 11.2. Document Preprocessing
        3. 11.3. Monolingual Information Retrieval
        4. 11.4. CLIR
        5. 11.5. MLIR
        6. 11.6. Evaluation in Information Retrieval
        7. 11.7. Tools, Software, and Resources
        8. 11.8. Summary
        9. Acknowledgments
        10. Bibliography
      5. Chapter 12. Multilingual Automatic Summarization
        1. 12.1. Introduction
        2. 12.2. Approaches to Summarization
        3. 12.3. Evaluation
        4. 12.4. How to Build a Summarizer
        5. 12.5. Competitions and Data Sets
        6. 12.6. Summary
        7. Bibliography
      6. Chapter 13. Question Answering
        1. 13.1. Introduction and History
        2. 13.2. Architectures
        3. 13.3. Source Acquisition and Preprocessing
        4. 13.4. Question Analysis
        5. 13.5. Search and Candidate Extraction
        6. 13.6. Answer Scoring
        7. 13.7. Crosslingual Question Answering
        8. 13.8. A Case Study
        9. 13.9. Evaluation
        10. 13.10. Current and Future Challenges
        11. 13.11. Summary and Further Reading
        12. Acknowledgments
        13. Bibliography
      7. Chapter 14. Distillation
        1. 14.1. Introduction
        2. 14.2. An Example
        3. 14.3. Relevance and Redundancy
        4. 14.4. The Rosetta Consortium Distillation System
        5. 14.5. Other Distillation Approaches
        6. 14.6. Evaluation and Metrics
        7. 14.7. Summary
        8. Bibliography
      8. Chapter 15. Spoken Dialog Systems
        1. 15.1. Introduction
        2. 15.2. Spoken Dialog Systems
        3. 15.3. Forms of Dialog
        4. 15.4. Natural Language Call Routing
        5. 15.5. Three Generations of Dialog Applications
        6. 15.6. Continuous Improvement Cycle
        7. 15.7. Transcription and Annotation of Utterances
        8. 15.8. Localization of Spoken Dialog Systems
        9. 15.9. Summary
        10. Bibliography
      9. Chapter 16. Combining Natural Language Processing Engines
        1. 16.1. Introduction
        2. 16.2. Desired Attributes of Architectures for Aggregating Speech and NLP Engines
        3. 16.3. Architectures for Aggregation
        4. 16.4. Case Studies
        5. 16.5. Lessons Learned
        6. 16.6. Summary
        7. 16.7. Sample UIMA Code
        8. Bibliography
    11. Index