There's more...

Please see the recipe LDA to classify documents and text into topics in Chapter 8Unsupervised Clustering with Apache Spark 2.0 for a more detailed explanation of the LDA algorithm itself.

The following white paper from the Journal of Machine Learning Research (JMLR) provides a comprehensive treatment for those who would like to do an extensive analysis. It is a well written paper, and a person with a basic background in stat and math should be able to follow it without any problems.

Refer to the http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf link for more details of JMLR; an alternative link is https://www.cs.colorado.edu/~mozer/Teaching/syllabi/ProbabilisticModels/readings/BleiNgJordan2003.pdf.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.