How it works...

In our experience, most of the difficulties with SparseMatrices come from a lack of understanding of the difference between Compressed Row Storage (CRS) and Compressed Column Storage (CCS). We highly recommend that the reader researches this topic in depth to clearly understand the differences.

In short, the CCS format is used by Spark for the transposed target matrix:

  1. There are two distinct signatures for this method call constructor:
    • SparseMatrix (int numRows, int numCols, int[] colPtrs, int[] rowIndices, double[] values)
    • SparseMatrix(int numRows, int numCols, int[] colPtrs, int[] rowIndices, double[] values, boolean isTransposed)

In option number two, we are indicating that the matrix is declared as transposed already, ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.