Summary

In this chapter we have explored the qualifiers of large datasets, their common characteristics, the problems of repetition, and the reasons for the hyper-growth in volumes; in fact, the big data context.

The need for applying conventional Machine learning algorithms to large datasets has given rise to new challenges for Machine learning practitioners. Traditional Machine learning libraries do not quite support, processing huge datasets. Parallelization using modern parallel computing frameworks, such as MapReduce, have gained popularity and adoption; this has resulted in the birth of new libraries that are built over these frameworks.

The concentration was on methods that are suitable for massive data, and have potential for the parallel ...

Get Practical Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.