O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hands-On Data Warehousing with Azure Data Factory

Book Description

Leverage the power of Microsoft Azure Data Factory v2 to build hybrid data solutions

About This Book
  • Combine the power of Azure Data Factory v2 and SQL Server Integration Services
  • Design and enhance performance and scalability of a modern ETL hybrid solution
  • Interact with the loaded data in data warehouse and data lake using Power BI
Who This Book Is For

This book is for you if you are a software professional who develops and implements ETL solutions using Microsoft SQL Server or Azure cloud. It will be an added advantage if you are a software engineer, DW/ETL architect, or ETL developer, and know how to create a new ETL implementation or enhance an existing one with ADF or SSIS.

What You Will Learn
  • Understand the key components of an ETL solution using Azure Data Factory and Integration Services
  • Design the architecture of a modern ETL hybrid solution
  • Implement ETL solutions for both on-premises and Azure data
  • Improve the performance and scalability of your ETL solution
  • Gain thorough knowledge of new capabilities and features added to Azure Data Factory and Integration Services
In Detail

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources.

Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick's Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights.

By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.

Style and approach

A step-by-step guide to develop data movement code using SSIS, Azure Data Factory, and database stored procedures for implementing intelligent BI solutions.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Data Warehousing with Azure Data Factory
  3. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  4. Contributors
    1. About the authors
    2. About the reviewer
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. The Modern Data Warehouse
    1. The need for a data warehouse
      1. Driven by IT
      2. Self-service BI
      3. Cloud-based BI – big data and artificial intelligence
    2. The modern data warehouse
      1. Main components of a data warehouse
        1. Staging area
        2. Data warehouse
          1. Cubes
        3. Consumption layer – BI and analytics
        4. What is Azure Data Factory
      2. Limitations of ADF V1.0
    3. What's new in V2.0?
      1. Integration runtime
      2. Linked services
      3. Datasets
      4. Pipelines
        1. Activities
      5. Parameters
      6. Expressions
      7. Controlling the flow of activities
      8. SSIS package deployment in Azure
      9. Spark cluster data store
    4. Summary
  7. Getting Started with Our First Data Factory
    1. Resource group
    2. Azure Data Factory
      1. Datasets
        1. Linked services
        2. Integration runtimes
      2. Activities
      3. Monitoring the data factory pipeline runs
    3. Azure Blob storage
      1. Blob containers
    4. Types of blobs
      1. Block blobs
      2. Page blobs
      3. Replication of storage
      4. Creating an Azure Blob storage account
    5. SQL Azure database
      1. Creating the Azure SQL Server
      2. Attaching the BACPAC to our database
      3. Copying data using our data factory
    6. Summary
  8. SSIS Lift and Shift
    1. SSIS in ADF
      1. Sample setup
        1. Sample databases
        2. SSIS components
          1. Integration services catalog setup
          2. Sample solution in Visual Studio
          3. Deploying the project on-premises
    2. Leveraging our package in ADF V2
      1. Integration runtimes
        1. Azure integration runtime
        2. Self-hosted runtime
        3. SSIS integration runtime
      2. Adding an SSIS integration runtime to the factory
      3. SSIS execution from a pipeline
    3. Summary
  9. Azure Data Lake
    1. Creating and configuring Data Lake Store
      1. Next Steps
        1. Ways to copy/import data from a database to the Data Lake
          1. Ways to store imported data in files in the Data Lake
          2. Easily moving data to the Data Lake Store
        2. Ways to directly copy files into the Data Lake
        3. Prerequisites for the next steps
    2. Creating a Data Lake Analytics resource
    3. Using the data factory to manipulate data in the Data Lake
      1. Task 1 – copy/import data from SQL Server to a blob storage file using data factory
      2. Task 2 – run a U-SQL task from the data factory pipeline to summarize data
        1. Service principal authentication
    4. Run U-SQL from a job in the Data Lake Analytics
    5. Summary
  10. Machine Learning on the Cloud
    1. Machine learning overview
      1. Machine learning algorithms
        1. Supervised learning
        2. Unsupervised learning
        3. Reinforcement learning
    2. Machine learning tasks
      1. Making predictions with regression algorithms
      2. Automated classification using machine learning
      3. Identifying groups using clustering methods
      4. Dimensionality reduction to improve performance
        1. Feature selection
        2. Feature extraction
    3. Azure Machine Learning Studio
      1. Azure Machine Learning Studio account
      2. Azure Machine Learning Studio experiment
        1. Dataset
        2. Module
        3. Work area
    4. Breast cancer detection
      1. Get the data
      2. Prepare the data
      3. Train the model
      4. Score and evaluate the model
    5. Summary
  11. Introduction to Azure Databricks
    1. Azure Databricks setup
    2. Prepare the data to ingest
      1. Setting up the folder in the Azure storage account
      2. Self-hosted integration runtime
      3. Linked service setup
      4. Datasets setup
        1. SQL Server dataset
        2. Blob storage dataset
          1. Linked service
          2. Dataset
    3. Copy data from SQL Server to sales-data
      1. Publish and trigger the copy activity
    4. Databricks notebook
    5. Calling Databricks notebook execution in ADF
    6. Summary
  12. Reporting on the Modern Data Warehouse
    1. Different types of BI
      1. Self-service – personal
      2. Team BI – sharing personal BI data
      3. Corporate BI
        1. Power BI Premium
        2. Power BI Report Server
    2. Power BI consumption
    3. Creating our Power BI reports
      1. Reporting with on-premise data sources
    4. Incorporating Spark data
    5. Summary