Topic modeling with Latent Dirichlet allocation in Spark 2.0

In this recipe, we will be demonstrating topic model generation by utilizing Latent Dirichlet Allocation to infer topics from a collection of documents.

We have covered LDA in previous chapters as it applies to clustering and topic modelling, but in this chapter, we demonstrate a more elaborate example to show its application to text analytics using more real-life and complex datasets.

We also apply NLP techniques such as stemming and stop words to provide a more realistic approach to LDA problem-solving. What we are trying to do is to discover a set of latent factors (that is, different from the original) that can solve and describe the solution in a more efficient way in a reduced ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.