Data preprocessing

Taking into account the goals of data preparation, Scala was chosen as an easy and interactive way to manipulate data:

val priceDataFileName: String = "bitstampUSD_1-min_data_2012-01-01_to_2017-10-20.csv"val spark = SparkSession    .builder()    .master("local[*]")    .config("spark.sql.warehouse.dir", "E:/Exp/")    .appName("Bitcoin Preprocessing")    .getOrCreate()val data = spark.read.format("com.databricks.spark.csv").option("header", "true").load(priceDataFileName)data.show(10)>>>
Figure 5: A glimpse of the Bitcoin historical price dataset
println((data.count(), data.columns.size))

>>>

(3045857, 8)

In the preceding code, we load data ...

Get Scala Machine Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.