O'Reilly logo

Mahout in Action by Ellen Friedman, Ted Dunning, Robin Anil, Sean Owen

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 8. Representing data

This chapter covers

  • Representing data as a Vector
  • Converting text documents into Vector form
  • Normalizing data representations

To get good clustering, you need to understand the techniques of vectorization: the process of representing objects as Vectors. A Vector is a very simplified representation of data that can help clustering algorithms understand the object and help compute its similarity with other objects. This chapter explores various ways of converting different kinds of objects into Vectors.

In the last chapter, you got a taste of clustering. Books were clustered together based on the similarity of their words, and points in a two-dimensional plane were clustered together based on the distances between ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required