Chapter 5 Topic-based Generative Models for Text Information Access ¹

5.1. Introduction

In this chapter, generative models of text documents are presented. They can either be used to classify texts (in a priori known classes/labels) or to cluster them (into groups, not known a priori). As presented in Appendix A, the only difference between classification/categorization and clustering comes from the data available for learning. In the case of classification, (document, class) couples are considered – this is called supervised learning, whereas in the case of clustering, only single documents are considered – this is called unsupervised learning. Semi-supervised learning also exists, where only a sub-part of the learning data is associated with a class [CHA 06b, ZHU 09]. From here in, the generic term “categorization” will be used for all of these situations.

Numerous generative models exist for text categorization [SEB 02, ZHO 05], but here we focus on the most successful of the most recent models (last decade): the Topic Models, also known as “latent semantic-based models”, or “discrete principal component analysis” [BUN 06, STE 07, BLE 09].

5.1.1. Generative versus discriminative models

Generative and discriminative models (see Chapters 4 and 6) share the same framework, which can be described in general terms by two random variables X and Y, one of which (X) is observed, and the other (Y) is assumed or hidden, latent. These models differ, however, in their objective: generative ...

Get Textual Information Access: Statistical Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Textual Information Access: Statistical Models by

Chapter 5

Topic-based Generative Models for Text Information Access ¹

5.1. Introduction

5.1.1. Generative versus discriminative models

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

Chapter 5

Topic-based Generative Models for Text Information Access 1

5.1. Introduction

5.1.1. Generative versus discriminative models

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

Topic-based Generative Models for Text Information Access ¹