Reference
- Learning Spark, by Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia, O'Reilly, provides a much more complete introduction to Spark that this chapter can provide. I thoroughly recommend it.
- If you are interested in learning more about information theory, I recommend David MacKay's book Information Theory, Inference, and Learning Algorithms.
- Information Retrieval, by Manning, Raghavan, and Schütze, describes how to analyze textual data (including lemmatization and stemming). An online
- On the Ling-Spam dataset, and how to analyze it: http://www.aueb.gr/users/ion/docs/ir_memory_based_antispam_filtering.pdf.
- This blog post delves into the Spark Web UI in more detail. https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html ...
Get Scala for Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.