O'Reilly logo

Mining the Web by Soumen Chakrabarti

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

CHAPTER 6 SEMISUPERVISED LEARNING

We have seen two extreme learning paradigms so far. The setting in Chapter 4 was unsupervised: only a collection of documents was provided without any labels, and the system was supposed to propose a grouping of the documents based on similarity. In contrast, Chapter 5 considered the completely supervised setting where each object was tagged with a class. Real-life applications are somewhere in between. It is generally easy to collect unsupervised data: every time Google completes a crawl, a collection of over a billion documents is created. On the other hand, labeling is a laborious job, which explains why the size and reach of Yahoo! and the Open Directory lag behind the size of the Web.

Consider a document ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required