Creating matrices

Matrix is simply a table to represent multiple feature vectors. A matrix that can be stored on one machine is called local matrix and the one that can be distributed across the cluster is called distributed matrix.

Local matrices have integer-based indices, while distributed matrices have long-based indices. Both have values as doubles.

There are three types of distributed matrices:

  • RowMatrix: This has each row as a feature vector.
  • IndexedRowMatrix: This also has row indices.
  • CoordinateMatrix: This is simply a matrix of MatrixEntry. A MatrixEntry represents an entry in the matrix represented by its row and column index.

How to do it…

  1. Start the Spark shell:
    $spark-shell
    
  2. Import the matrix-related classes:
    scala> import org.apache.spark.mllib.linalg.{Vectors,Matrix, ...

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.