Creating matrices
Matrix is simply a table to represent multiple feature vectors. A matrix that can be stored on one machine is called local matrix and the one that can be distributed across the cluster is called distributed matrix.
Local matrices have integer-based indices, while distributed matrices have long-based indices. Both have values as doubles.
There are three types of distributed matrices:
RowMatrix
: This has each row as a feature vector.IndexedRowMatrix
: This also has row indices.CoordinateMatrix
: This is simply a matrix ofMatrixEntry
. AMatrixEntry
represents an entry in the matrix represented by its row and column index.
How to do it…
- Start the Spark shell:
$spark-shell
- Import the matrix-related classes:
scala> import org.apache.spark.mllib.linalg.{Vectors,Matrix, ...
Get Spark Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.