Chapter 2. Getting Data into Azure

In this chapter, we focus on the approaches for transferring data from the data source to Azure. We separate out the discussion into approaches that transfer typically large quantities of data in a single effort (bulk data loading) versus approaches that transfer individual data (stream loading), and investigate the protocols and tools relevant to each.

Using our Azure analytics pipeline as a guide, this chapter focuses on the items highlighted by the red, dashed borders in Figure 2-1.

Ingest Loading Layer

In order to perform analytics in Azure, you need to start by getting data into Azure in the first place. This is the point of the ingest phase. Ultimately, the goal is to get data from a source location (e.g., on premises or another cloud) into either file- or queue-based storage within Azure. In this context, we will look at the client tooling, processes, and protocols used to get the data to the destination in Azure.

To help put this layer in context, let’s refer back to the Blue Yonder Airlines scenario. They have historical flight delay data, historical weather data, and smart building telemetry upon which they wish to perform analytics. The first two data sets are candidates for bulk loading, which we will discuss next. The last data set, the smart building telemetry, is a candidate for streaming ingest, which we will examine later in the chapter.

The next chapter will dive into details of how data is stored once it lands in Azure, while ...

Get Mastering Azure Analytics, 1st Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.