O'Reilly logo

Textual Information Access: Statistical Models by Francois Yvon, Eric Gaussier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5

Topic-based Generative Models for Text Information Access 1

5.1. Introduction

In this chapter, generative models of text documents are presented. They can either be used to classify texts (in a priori known classes/labels) or to cluster them (into groups, not known a priori). As presented in Appendix A, the only difference between classification/categorization and clustering comes from the data available for learning. In the case of classification, (document, class) couples are considered – this is called supervised learning, whereas in the case of clustering, only single documents are considered – this is called unsupervised learning. Semi-supervised learning also exists, where only a sub-part of the learning data is associated with a class [CHA 06b, ZHU 09]. From here in, the generic term “categorization” will be used for all of these situations.

Numerous generative models exist for text categorization [SEB 02, ZHO 05], but here we focus on the most successful of the most recent models (last decade): the Topic Models, also known as “latent semantic-based models”, or “discrete principal component analysis” [BUN 06, STE 07, BLE 09].

5.1.1. Generative versus discriminative models

Generative and discriminative models (see Chapters 4 and 6) share the same framework, which can be described in general terms by two random variables X and Y, one of which (X) is observed, and the other (Y) is assumed or hidden, latent. These models differ, however, in their objective: generative ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required