You are previewing Textual Information Access: Statistical Models.
O'Reilly logo
Textual Information Access: Statistical Models

Book Description

This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access:

- information extraction and retrieval;

- text classification and clustering;

- opinion mining;

- comprehension aids (automatic summarization, machine translation, visualization).

In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications concerned, by highlighting the relationship between models and applications and by illustrating the behavior of each model on real collections.

Textual Information Access is organized around four themes: informational retrieval and ranking models, classification and clustering (regression logistics, kernel methods, Markov fields, etc.), multilingualism and machine translation, and emerging applications such as information exploration.

Contents

Part 1: Information Retrieval

1. Probabilistic Models for Information Retrieval, Stéphane Clinchant and Eric Gaussier.

2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval, Massih-Réza Amini, David Buffoni, Patrick Gallinari,
 Tuong Vinh Truong and Nicolas Usunier.

Part 2: Classification and Clustering

3. Logistic Regression and Text Classification, Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis,
 Michel Burlet and Yves Denneulin.

4. Kernel Methods for Textual Information Access, Jean-Michel Renders.

5. Topic-Based Generative Models for Text 
Information Access, Jean-Cédric Chappelier.

6. Conditional Random Fields for Information Extraction, Isabelle Tellier and Marc Tommasi.

Part 3: Multilingualism

7. Statistical Methods for Machine Translation, Alexandre Allauzen and François Yvon.

Part 4: Emerging Applications

8. Information Mining: Methods and Interfaces for Accessing Complex Information, Josiane Mothe, Kurt Englmeier and Fionn Murtagh.

9. Opinion Detection as a Topic Classification Problem, Juan-Manuel Torres-Moreno, Marc El-Bèze, Patrice Bellot and
 Fréderic Béchet.

Table of Contents

  1. Cover
  2. Title Page
  3. Copyright
  4. Introduction
  5. Part 1: Information Retrieval
    1. Chapter 1: Probabilistic Models for Information Retrieval
      1. 1.1. Introduction
        1. 1.1.1. Heuristic retrieval constraints
      2. 1.2. 2-Poisson models
      3. 1.3. Probability ranking principle (PRP)
        1. 1.3.1. Reformulation
          1. 1.3.1.1. Binary independence model
        2. 1.3.2. BM25
      4. 1.4. Language models
        1. 1.4.1. Smoothing methods
          1. 1.4.1.1. Jelinek-Mercer smoothing
          2. 1.4.1.2. Dirichlet smoothing
        2. 1.4.2. The Kullback-Leibler model
        3. 1.4.3. Noisy channel model
        4. 1.4.4. Some remarks
      5. 1.5. Informational approaches
        1. 1.5.1. DFR models
          1. 1.5.1.1. Frequency normalization
          2. 1.5.1.2. Inf1 model
          3. 1.5.1.3. Prob2 model
          4. 1.5.1.4. Combination of Inf1 and Prob2 models
          5. 1.5.1.5. Critiques
        2. 1.5.2. Information-based models
          1. 1.5.2.1. Two instances
      6. 1.6. Experimental comparison
      7. 1.7. Tools for information retrieval
      8. 1.8. Conclusion
      9. 1.9. Bibliography
    2. Chapter 2: Learnable Ranking Models for Automatic Text Summarization and Information Retrieval
      1. 2.1. Introduction
        1. 2.1.1. Ranking of instances
          1. 2.1.1.1. Formalism
            1. 2.1.1.1.1. Score function
            2. 2.1.1.1.2. Error function
        2. 2.1.1.2. Classification of critical pairs
        3. 2.1.1.3. Application with a linear model
          1. 2.1.1.3.1. Learning and interference complexities
        4. 2.1.1.4. Ranking induced by the output of a classifier
        5. 2.1.1.5. Other criteria
        6. 2.1.1.6. Special cases: bipartite ranking
          1. 2.1.1.6.1. Area under the ROC curve and ranking
        7. 2.1.2. Ranking of alternatives
          1. 2.1.2.1. Formalism
          2. 2.1.2.2. Linear model for ranking of alternatives2.1.2.2. Linear model for ranking of alternatives
            1. 2.1.2.2.1. Joint representation
            2. 2.1.2.2.2. Learning and error functions
            3. 2.1.2.2.3. Classification of critical pairs
            4. 2.1.2.2.4. Algorithmic complexity
        8. 2.1.3. Relation to existing frameworks
          1. 2.1.3.1. Ordinal regression
          2. 2.1.3.2. Preference relationship learning
      2. 2.2. Application to automatic text summarization
        1. 2.2.1. Presentation of the application
          1. 2.2.1.1. Summary format
        2. 2.2.2. Automatic summary and learning
      3. 2.3. Application to information retrieval
        1. 2.3.1. Application presentation
        2. 2.3.2. Search engines and learning
          1. 2.3.2.1. Constitution of a learning base
          2. 2.3.2.2. Favoring the top of the list of documents returned
        3. 2.3.3. Experimental results
      4. 2.4. Conclusion
      5. 2.5. Bibliography
  6. Part 2: Classification and Clustering
    1. Chapter 3: Logistic Regression and Text Classification
      1. 3.1. Introduction
      2. 3.2. Generalized linear model
      3. 3.3. Parameter estimation
      4. 3.4. Logistic regression
        1. 3.4.1. Multinomial logistic regression
      5. 3.5. Model selection
        1. 3.5.1. Ridge regularization
        2. 3.5.2. LASSO regularization
        3. 3.5.3. Selected Ridge regularization
      6. 3.6. Logistic regression applied to text classification
        1. 3.6.1. Problem statement
        2. 3.6.2. Data pre-processing
        3. 3.6.3. Experimental results
          1. 3.6.3.1. Experiment on Reuters-21587
          2. 3.6.3.2. Ohsumed experiment
          3. 3.6.3.3. 20-Newsgroups experiment
          4. 3.6.3.4. DMOZ experiments
      7. 3.7. Conclusion
      8. 3.8. Bibliography
    2. Chapter 4: Kernel Methods for Textual Information Access
      1. 4.1. Kernel methods: context and intuitions
      2. 4.2. General principles of kernel methods
      3. 4.3. General problems with kernel choices (kernel engineering)
      4. 4.4. Kernel versions of standard algorithms: examples of solvers
        1. 4.4.1. Kernal logistic regression
        2. 4.4.2. Support vector machines
        3. 4.4.3. Principal component analysis
        4. 4.4.4. Other methods
      5. 4.5. Kernels for text entities
        1. 4.5.1. “Bag-of-words” kernels
        2. 4.5.2. Semantic kernels
        3. 4.5.3. Diffusion kernels
        4. 4.5.4. Sequence kernels
          1. 4.5.4.1. “p-spectrum” kernels
          2. 4.5.4.2. “All subsequence” kernels
          3. 4.5.4.3. “Fixed-length subsequence” kernels
        5. 4.5.5. Tree kernels
        6. 4.5.6. Graph kernels
        7. 4.5.7. Kernels derived from generative models
          1. 4.5.7.1. Marginalized conditional independence kernels
          2. 4.5.7.2. Marginalized latent variable kernels
          3. 4.5.7.3. Fisher kernels
      6. 4.6. Summary
      7. 4.7. Bibliography
    3. Chapter 5: Topic-Based Generative Models for Text Information Access
      1. 5.1. Introduction
        1. 5.1.1. Generative versus discriminative models
        2. 5.1.2. Text models
        3. 5.1.3. Estimation, prediction and smoothing
        4. 5.1.4. Terminology and notations
      2. 5.2. Topic-based models
        1. 5.2.1. Fundamental principles
        2. 5.2.2. Illustration
        3. 5.2.3. General framework
        4. 5.2.4. Geometric interpretation
        5. 5.2.5. Application to text categorization
      3. 5.3. Topic models
        1. 5.3.1. Probabilistic Latent Semantic Indexing
          1. 5.3.1.1. Model
          2. 5.3.1.2. Illustration
          3. 5.3.1.3. Limitations
        2. 5.3.2. Latent Dirichlet Allocation
          1. 5.3.2.1. Model
          2. 5.3.2.2. Geometric interpretation of LDA
          3. 5.3.2.3. Variational inference
          4. 5.3.2.4. Gibbs sampling inference
          5. 5.3.2.5. Estimation of the meta-parameters
          6. 5.3.2.6. Prediction
        3. 5.3.3. Conclusion
          1. 5.3.3.1. Link between PLSI and LDA
          2. 5.3.3.2. Other topic models
      4. 5.4. Term models
        1. 5.4.1. Limitations of the multinomial
        2. 5.4.2. Dirichlet compound multinomial
        3. 5.4.3. DCM–LDA
      5. 5.5. Similarity measures between documents
        1. 5.5.1. Language models
        2. 5.5.2. Similarity between topic distributions
        3. 5.5.3. Fisher kernels
      6. 5.6. Conclusion
      7. 5.7. Appendix: topic model software
      8. 5.8. Bibliography
    4. Chapter 6: Conditional Random Fields for Information Extraction
      1. 6.1. Introduction
      2. 6.2. Information extraction
        1. 6.2.1. The task
        2. 6.2.2. Variants
        3. 6.2.3. Evaluations
        4. 6.2.4. Approaches not based on machine learning
      3. 6.3. Machine learning for information extraction
        1. 6.3.1. Usage and limitations
        2. 6.3.2. Some applicable machine learning methods
        3. 6.3.3. Annotating to extract
      4. 6.4. Introduction to conditional random fields
        1. 6.4.1. Formalization of a labelling problem
        2. 6.4.2. Maximum entropy model approach
        3. 6.4.3. Hidden Markov model approach
        4. 6.4.4. Graphical models
      5. 6.5. Conditional random fields
        1. 6.5.1. Definition
        2. 6.5.2. Factorization and graphical models
        3. 6.5.3. Junction tree
        4. 6.5.4. Inference in CRFs
        5. 6.5.5. Inference algorithms
        6. 6.5.6. Training CRFs
      6. 6.6. Conditional random fields and their applications
        1. 6.6.1. Linear conditional random fields
        2. 6.6.2. Links between linear CRFs and hidden Markov models
        3. 6.6.3. Interests and applications of CRFs
        4. 6.6.4. Beyond linear CRFs
        5. 6.6.5. Existing libraries
      7. 6.7. Conclusion
      8. 6.8. Bibliography
  7. Part 3: Multilingualism
    1. Chapter 7: Statistical Methods for Machine Translation
      1. 7.1. Introduction
        1. 7.1.1. Machine translation in the age of the Internet
        2. 7.1.2. Organization of the chapter
        3. 7.1.3. Terminological remarks
      2. 7.2. Probabilistic machine translation: an overview
        1. 7.2.1. Statistical machine translation: the standard model
        2. 7.2.2. Word-based models and their limitations
          1. 7.2.2.1. Lexical ambiguities
          2. 7.2.2.2. A word for a word
          3. 7.2.2.3. Word order issues
        3. 7.2.3. Phrase-based models
      3. 7.3. Phrase-based models
        1. 7.3.1. Building word alignments
          1. 7.3.1.1. IBM model 1
          2. 7.3.1.2. Computing alignments with hidden Markov models
          3. 7.3.1.3. Modeling fertility, IBM model 3 and beyond
          4. 7.3.1.4. Symmetrization
        2. 7.3.2. Word alignment models: a summary
        3. 7.3.3. Extracting bisegments
          1. 7.3.3.1. Consistent bisegments
          2. 7.3.3.2. bisegment extraction
          3. 7.3.3.3. Scoring bisegments
      4. 7.4. Modeling reorderings
        1. 7.4.1. The space of possible reorderings
          1. 7.4.1.1. Local permutations
          2. 7.4.1.2. “IBM” constraints
          3. 7.4.1.3. Distortion based reordering
          4. 7.4.1.4. Hierarchical reordering
          5. 7.4.1.5. Reordering segments
        2. 7.4.2. Evaluating permutations
          1. 7.4.2.1. Modeling the target language
          2. 7.4.2.2. Distortion
          3. 7.4.2.3. Lexical reordering
      5. 7.5. Translation: a search problem
        1. 7.5.1. Combining models
          1. 7.5.1.1. The problem
          2. 7.5.1.2. Minimum error rate training
        2. 7.5.2. The decoding problem
        3. 7.5.3. Exact search algorithms
          1. 7.5.3.1. Monotone translations
          2. 7.5.3.2. Translating with local reorderings
        4. 7.5.4. Heuristic search algorithms
          1. 7.5.4.1. “Best first” search
            1. 7.5.4.1.1. Expending hypotheses
            2. 7.5.4.1.2. Managing hypotheses
            3. 7.5.4.1.3. Pruning
          2. 7.5.4.2. Greedy search and local exploration
        5. 7.5.5. Decoding: a solved problem?
      6. 7.6. Evaluating machine translation
        1. 7.6.1. Subjective evaluations
          1. 7.6.1.1. Automatic evaluation
        2. 7.6.2. The BLEU metric
        3. 7.6.3. Alternatives to BLEU
        4. 7.6.4. Evaluating machine translation: an open problem
      7. 7.7. State-of-the-art and recent developments
        1. 7.7.1. Using source context
          1. 7.7.1.1. Using the micro-context
          2. 7.7.1.2. Using the Macro-context
        2. 7.7.2. Hierarchical models
        3. 7.7.3. Translating with linguistic resources
          1. 7.7.3.1. Bilingual terminologies and dictionaries
          2. 7.7.3.2. Morphological analysis in MT
          3. 7.7.3.3. Modeling syntactic congruences
      8. 7.8. Useful resources
        1. 7.8.1. Bibliographic data and online resources
        2. 7.8.2. Parallel corpora
        3. 7.8.3. Tools for statistical machine translation
          1. 7.8.3.1. Evaluation of machine translation
      9. 7.9. Conclusion
      10. 7.10. Acknowledgments
      11. 7.11. Bibliography
  8. Part 4: Emerging Applications
    1. Chapter 8: Information Mining: Methods and Interfaces for Accessing Complex Information
      1. 8.1. Introduction
      2. 8.2. The multidimensional visualization of information
        1. 8.2.1. Accessing information based on the knowledge of the structured domain
        2. 8.2.2. Visualization of a set of documents via their content
        3. 8.2.3. OLAP principles applied to document sets
      3. 8.3. Domain mapping via social networks
      4. 8.4. Analyzing the variability of searches and data merging
        1. 8.4.1. Analysis of IR engine results
        2. 8.4.2. Use of data unification
      5. 8.5. The seven types of evaluation measures used in IR
      6. 8.6. Conclusion
      7. 8.7. Acknowledgments
      8. 8.8. Bibliography
    2. Chapter 9: Opinion Detection as a Topic Classification Problem
      1. 9.1. Introduction
      2. 9.2. The TREC and TAC evaluation campaigns
        1. 9.2.1. Opinion detection by question–answering
        2. 9.2.2. Automatic summarization of opinions
        3. 9.2.3. The text mining challenge of opinion classification (DEFT (DÉfi Fouille de Textes))
          1. 9.2.3.1. Integration of systems
          2. 9.2.3.2. First results using integration
      3. 9.3. Cosine weights - a second glance
      4. 9.4. Which components for a opinion vectors?
        1. 9.4.1. How to pass from words to terms?
      5. 9.5. Experiments
        1. 9.5.1. Performance, analysis, and visualization of the results on the IMDB corpus
          1. 9.5.1.1. IMDB performance
          2. 9.5.1.2. Presentation and analysis of a 857_17527 IMDB example
      6. 9.6. Extracting opinions from speech: automatic analysis of phone polls
        1. 9.6.1. France Télécom opinion investigation corpus
        2. 9.6.2. Automatic recognition of spontaneous speech in opinion corpora
          1. 9.6.2.1. Segmentation of the opinion support messages
          2. 9.6.2.2. Segmentation of messages with conditional random fields
          3. 9.6.2.3. Language models specific to opinion expressions
          4. 9.6.2.4. Classification opinion
        3. 9.6.3. Evaluation
      7. 9.7. Conclusion
      8. 9.8. Bibliography
  9. Appendix A: Probabilistic Models: An Introduction
    1. A. 1. Introduction
    2. A. 2. Supervised categorization
      1. A. 2.1. Filtering documents
      2. A. 2.2. The Bernoulli model
        1. A. 2.2.1. Representing documents
        2. A. 2.2.2. The Bernoulli model
        3. A. 2.2.3. Parameter estimation
        4. A. 2.2.4. Summary
      3. A. 2.3. The multinomial model
        1. A. 2.3.1. Parameter estimation
      4. A. 2.4. Evaluating categorization systems
      5. A. 2.5. Extensions
      6. A. 2.6. A first summary
    3. A. 3. Unsupervised learning: the multinomial mixture model
      1. A. 3.1. Mixture models
      2. A. 3.2. Parameter estimation
        1. A. 3.2.1. A generic approach: the EM algorithm
          1. A. 3.2.1.1. Optimizing the auxiliary function
        2. A. 3.2.2. The EM algorithm: complements
      3. A. 3.3. Applications
        1. A. 3.3.1. Exploratory analysis of the document collections
        2. A. 3.3.2. Conclusions and additional remarks
    4. A. 4. Markov models: statistical models for sequences
      1. A. 4.1. Modeling sequences
      2. A. 4.2. Estimating a Markov model
      3. A. 4.3. Language models
        1. A. 4.3.1. Estimating language models
        2. A. 4.3.2. Some applications of language models
          1. A. 4.3.2.1. Language identification
          2. A. 4.3.2.2. Assessing grammaticality
    5. A. 5. Hidden Markov models
      1. A. 5.1. The model
      2. A. 5.2. Algorithms for hidden Markov models
        1. A. 5.2.1. Marginal probability of an observation sequence
          1. A. 5.2.1.1. Forward recursions
          2. A. 5.2.1.2. Backward recursions
        2. A. 5.2.2. Optimal decoding
          1. A. 5.2.2.1. The Viterbi algorithm
          2. A. 5.2.2.2. Locally optimal decoding
        3. A. 5.2.3. Supervised parameter estimation
        4. A. 5.2.4. Unsupervised parameter estimation
          1. A. 5.2.4.1. An iterative method, again
          2. A. 5.2.4.2. The auxiliary function and its optimization
          3. A. 5.2.4.3. Complements
        5. A. 5.2.5. An application: thematic segmentation of documents
    6. A. 6. Conclusion
    7. A. 7. A primer of probability theory
      1. A. 7.1. Probability space, event
      2. A. 7.2. Conditional independence and probability
        1. A. 7.2.1. Three fundamental formulas
      3. A. 7.3. Random variables, moments
        1. A. 7.3.1. Moments
        2. A. 7.3.2. Entropy and related notions
      4. A. 7.4. Some useful distributions
        1. A. 7.4.1. Bernoulli distribution
        2. A. 7.4.2. Binomial distribution
        3. A. 7.4.3. The Poisson distribution
        4. A. 7.4.4. Multinomial distribution
    8. A. 8. Bibliography
  10. List of Authors
  11. Index