Chapter 5. Contingency Tables Using Sparse Coordinate Matrices

I like sparseness. Thereâs something about that minimalist feel that can make something have an immediate impact and make it unique. Iâll probably always work with that formula. I just donât know how.

Britt Daniel, lead singer of Spoon

Many real-world matrices are sparse, which means that most of their values are zero.

Using NumPy arrays to manipulate sparse matrices wastes a lot of time and energy multiplying many, many values by 0. Instead, we can use SciPyâs sparse module to solve these efficiently, examining only nonzero values. In addition to helping solve these âcanonicalâ sparse matrix problems, sparse can be used for problems that are not obviously related to sparse matrices.

One such problem is the comparison of image segmentations. (Review ChapterÂ 3 for a definition of segmentation.)

The code sample motivating this chapter uses sparse matrices twice. First, we use code nominated by Andreas Mueller to compute a contingency matrix that counts the correspondence of labels between two segmentations. Then, with suggestions from Jaime FernÃ¡ndez del RÃo and Warren Weckesser, we use that contingency matrix to compute the variation of information, which measures the differences between segmentations.

def variation_of_information(x, y):
    # compute contingency matrix, aka joint probability matrix
    n = x.size
    Pxy = sparse.coo_matrix((np.full(n, 1/n), (x.ravel(), y.ravel())),
                            dtype=float).tocsr()

    # compute ...

Get Elegant SciPy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Elegant SciPy by Juan Nunez-Iglesias, Stéfan van der Walt, Harriet Dashnow

Chapter 5. Contingency Tables Using Sparse Coordinate Matrices

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly