Summary

This chapter was devoted to learning about topic models and after sentiment analysis on movie reviews, this was our second foray into working with real-life text data. This time, our predictive task was classifying the topics of news articles on the Web. The primary technique for topic modeling on which we focused was Latent Dirichlet Allocation (LDA). This derives its name from the fact that it assumes that the topic and word distributions that can be found inside a document arise from hidden multinomial distributions that are sampled from Dirichlet priors. We saw that the generative process of sampling words and topics from these multinomial distributions mirrors many of the natural intuitions that we have about this domain; however, ...

Get Mastering Predictive Analytics with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.