Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo
The Text Mining Handbook

Book Description

Text mining is a new and exciting area of computer science research that tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. Similarly, link detection - a rapidly evolving approach to the analysis of text that shares and builds upon many of the key elements of text mining - also provides new tools for people to better leverage their burgeoning textual data resources. The Text Mining Handbook presents a comprehensive discussion of the state-of-the-art in text mining and link detection. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, the book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection in such varied fields as M&A business intelligence, genomics research and counter-terrorism activities.

Table of Contents

  1. Coverpage
  2. The Text Mining Handbook
  3. Title page
  4. Copyright page
  5. Dedication
  6. Contents
  7. Preface
  8. I. Introduction to Text Mining
    1. I.1 Defining Text Mining
    2. I.2 General Architecture of Text Mining Systems
  9. II. Core Text Mining Operations
    1. II.1 Core Text Mining Operations
    2. II.2 Using Background Knowledge for Text Mining
    3. II.3 Text Mining Query Languages
  10. III. Text Mining Preprocessing Techniques
    1. III.1 Task-Oriented Approaches
    2. III.2 Further Reading
  11. IV. Categorization
    1. IV.1 Applications of Text Categorization
    2. IV.2 Definition of the Problem
    3. IV.3 Document Representation
    4. IV.4 Knowledge Engineering Approach to TC
    5. IV.5 Machine Learning Approach to TC
    6. IV.6 Using Unlabeled Data to Improve Classification
    7. IV.7 Evaluation of Text Classifiers
    8. IV.8 Citations and Notes
  12. V. Clustering
    1. V.1 Clustering Tasks in Text Analysis
    2. V.2 The General Clustering Problem
    3. V.3 Clustering Algorithms
    4. V.4 Clustering of Textual Data
    5. V.5 Citations and Notes
  13. VI. Information Extraction
    1. VI.1 Introduction to Information Extraction
    2. VI.2 Historical Evolution of IE: The Message Understanding Conferences and Tipster
    3. VI.3 IE Examples
    4. VI.4 Architecture of IE Systems
    5. VI.5 Anaphora Resolution
    6. VI.6 Inductive Algorithms for IE
    7. VI.7 Structural IE
    8. VI.8 Further Reading
  14. VII. Probabilistic Models for Information Extraction
    1. VII.1 Hidden Markov Models
    2. VII.2 Stochastic Context-Free Grammars
    3. VII.3 Maximal Entropy Modeling
    4. VII.4 Maximal Entropy Markov Models
    5. VII.5 Conditional Random Fields
    6. VII.6 Further Reading
  15. VIII. Preprocessing Applications Using Probabilistic and Hybrid Approaches
    1. VIII.1 Applications of HMM to Textual Analysis
    2. VIII.2 Using MEMM for Information Extraction
    3. VIII.3 Applications of CRFs to Textual Analysis
    4. VIII.4 TEG: Using SCFG Rules for Hybrid Statistical–Knowledge-Based IE
    5. VIII.5 Bootstrapping
    6. VIII.6 Further Reading
  16. IX. Presentation-layer considerations for browsing and query refinement
    1. IX.1 Browsing
    2. IX.2 Accessing Constraints and Simple Specification Filters at the Presentation Layer
    3. IX.3 Accessing the Underlying Query Language
    4. IX.4 Citations and Notes
  17. X. Visualization Approaches
    1. X.1 Introduction
    2. X.2 Architectural Considerations
    3. X.3 Common Visualization Approaches for Text Mining
    4. X.4 Visualization Techniques in Link Analysis
    5. X.5 Real-World Example: The Document Explorer System
  18. XI. Link Analysis
    1. XI.1 Preliminaries
    2. XI.2 Automatic Layout of Networks
    3. XI.3 Paths and Cycles in Graphs
    4. XI.4 Centrality
    5. XI.5 Partitioning of Networks
    6. XI.6 Pattern Matching in Networks
    7. XI.7 Software Packages for Link Analysis
    8. XI.8 Citations and Notes
  19. XII. Text Mining Applications
    1. XII.1 General Considerations
    2. XII.2 Corporate Finance: Mining Industry Literature for Business Intelligence
    3. XII.3 A “Horizontal” Text Mining Application: Patent Analysis Solution Leveraging a Commercial Text Analytics Platform
    4. XII.4 Life Sciences Research: Mining Biological Pathway Information with Geneways
  20. Appendix A: DIAL: A Dedicated Information Extraction Language for Text Mining
    1. A.1 What Is the DIAL Language?
    2. A.2 Information Extraction in the DIAL Environment
    3. A.3 Text Tokenization
    4. A.4 Concept and Rule Structure
    5. A.5 Pattern Matching
    6. A.6 Pattern Elements
    7. A.7 Rule Constraints
    8. A.8 Concept Guards
    9. A.9 Complete DIAL Examples
  21. Bibliography
  22. Index