Foreword

“Every 25 milliseconds, a turbine emits 10 distinct data points…” began almost every customer conversation about big data and advanced analytics that I’ve been a part of over the last six years. A simple story about the data needs of a wind farm highlighted the evolving size, speed, and shape of data that is representative of customers across industries. Over time, the technology names, the integration scenarios, and the guidance would evolve, but a few things remained consistent despite the ever-increasing pace of change:

  • Customers are faced with a rapidly expanding amount of data, in a variety of shapes and sizes, generated and stored throughout their environment.
  • Deep understanding of customers, of purchase patterns, of machine performance, of transaction streams, and more, is fast becoming table stakes as competitors are doing the same.
  • The pace of innovation from vendors, and more importantly the ecosystem, is operating at what feels like a record high.

The value that customers get from advanced analytics, big data, and machine learning can transform businesses, but there are still a lot of pieces that need to come together. I’ve been fortunate to have had such an immensely exciting, rewarding, and simply fun time building products customers can use to solve these challenges. These technologies have, in many cases, enabled people to build solutions that simply weren’t possible 5 or 10 years ago.

The addition of the Azure cloud in these scenarios has given customers an entirely new level of flexibility. Cloud services such as HDInsight make it faster, easier, and cheaper to experiment with a wide range of software and hardware combinations, make it possible to finely tune the consumption of cloud resources to the specifics of a given project, and to scale up and down as required. Additionally, the economic model of the cloud is fundamentally different than acquiring and operating these tools on premises, which enables scenarios that are simply not possible on premises. We’ve seen Azure customers scale out to a large number of GPU-enabled machines to conduct training using the latest deep learning libraries, and then take that output and deploy it to their web services (as well as to devices running anywhere), paying only for the few dollars’ worth of compute they used when they did so. Now, with this flexibility comes the need to manage and orchestrate across these systems, which can quickly become a key challenge.

This book takes the reader through the same workflow you’ll see for implementing an analytics project in the real world—building a data pipeline. By first walking through ingesting and storing data, you’ll set the stage in Azure for a rich set of insights to derive from that data. Once you’ve ingested the data, processing can occur in real time, in offline batch scenarios, and while using tools and languages that you’re familiar with. The next stage is in acting on the insights gained, whether through dashboards or further integration into other applications and services. Oftentimes, the analysis that we want to be able to do may also involve machine learning to bring structure or predictions to the data. It is said that most machine learning projects are 80% acquiring and processing the data prior to performing any machine learning, and the tools shown throughout this book can be used for this. Finally, we must deal with a set of very real operational aspects of any production data pipeline, such as security and data governance, which need to be considered throughout any project.

Zoiner’s perspective on this space is one crafted through years of hard work, walking hand in hand with customers who are looking to transform their businesses with the power of data. Zoiner and I met nearly 10 years ago while we were both working in the distributed systems space, where we shared a passion for orchestration engines and messaging layers. Since then, I have always appreciated his ability to work with fantastically complicated technologies and distill down the key choices and aspects of a solution into simple guidance that anyone can understand. I’m excited to see him applying that same approach to a topic that’s so near to me, and I’m excited to see what all the readers can do with the knowledge they will gain.

Get Mastering Azure Analytics, 1st Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.