O'Reilly logo

R: Data Analysis and Visualization by Ágnes Vidovics-Dancs, Kata Váradi, Tamás Vadász, Ágnes Tuza, Balázs Árpád Szucs, Julia Molnár, Péter Medvegyev, Balázs Márkus, István Margitai, Péter Juhász, Dániel Havran, Gergely Gabler, Barbara Dömötör, Gergely Daróczi, Ádám Banai, Milán Badics, Ferenc Illés, Edina Berlinger, Bater Makhabel, Hrishi V. Mittal, Jaynal Abedin, Brett Lantz, Tony Fischetti

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

K-means clustering on big data

Data frames and matrices are easy-to-use objects in R, with typical manipulations that execute quickly on datasets with a reasonable size. However, problems can arise when the user needs to handle larger data sets. In this section, we will illustrate how the bigmemory and biganalytics packages can solve the problem of too large datasets, which is impossible to handle by data frames or data tables.

Note

The latest updates of bigmemory, biganalytics, and biglm packages are not available on Windows at time of writing this chapter. The examples shown here assume that R Version 2.15.3 is the current state-of-the-art version of R for Windows.

In the following example, we will perform K-means clustering on large datasets. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required