Introduction

Data, data, data. You can't have escaped the headlines, reports, white papers, and even television coverage on the rise of Big Data and data science. The push is to learn, synthesize, and act upon all the data that comes out of social media, our phones, our hardware devices (otherwise known as “The Internet of Things”), sensors, and basically anything that can generate data.

The emphasis of most of this marketing is about data volumes and the velocity at which it arrives. Prophets of the data flood tell us we can't process this data fast enough, and the marketing machine will continue to hawk the services we need to buy to achieve all such speed. To some degree they are right, but it's worth stopping for a second and having a proper think about the task at hand.

Data mining and machine learning have been around for a number of years already, and the huge media push surrounding Big Data has to do with data volume. When you look at it closely, the machine learning algorithms that are being applied aren't any different from what they were years ago; what is new is how they are applied at scale. When you look at the number of organizations that are creating the data, it's really, in my opinion, the minority. Google, Facebook, Twitter, Netflix, and a small handful of others are the ones getting the majority of mentions in the headlines with a mixture of algorithmic learning and tools that enable them to scale. So, the real question you should ask is, “How does all this ...

Get Machine Learning: Hands-On for Developers and Technical Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.