Working with Scala and Spark Notebooks

Often the most frequent values or five-number summary are not sufficient to get the first understanding of the data. The term descriptive statistics is very generic and may refer to very complex ways to describe the data. Quantiles, a Paretto chart or, when more than one attribute is analyzed, correlations are also examples of descriptive statistics. When sharing all these ways to look at the data aggregates, in many cases, it is also important to share the specific computations to get to them.

Scala or Spark Notebook https://github.com/Bridgewater/scala-notebook, https://github.com/andypetrella/spark-notebook record the whole transformation path and the results can be shared as a JSON-based *.snb file. The ...

Get Mastering Scala Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.