Chapter 3

Singular Value Decomposition, Clustering, and Indexing for Similarity Search for Large Data Sets in High-Dimensional Spaces

Alexander Thomasian

Abstract

Representing objects such as images by their feature vectors and searching for similarity according to the distances of the points representing them in high-dimensional space via k-nearest neighbors (k-NNs) to a target image is a popular paradigm. We discuss a combination of singular value decomposition (SVD), clustering, and indexing to reduce the cost of processing k-NN queries for large data sets with high-dimensional data. We first review dimensionality reduction methods with emphasis on SVD and related methods, followed by a survey of clustering and indexing methods for high-dimensional ...

Get Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.