O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Mastering Azure Analytics, 1st Edition

Book Description

Microsoft Azure has over 20 platform-as-a-service (PaaS) offerings that can act in support of a big data analytics solution. So which one is right for your project? This practical book helps you understand the breadth of Azure services by organizing them into a reference framework you can use when crafting your own big data analytics solution.

You’ll not only be able to determine which service best fits the job, but also learn how to implement a complete solution that scales, provides human fault tolerance, and supports future needs.

  • Understand the fundamental patterns of the data lake and lambda architecture
  • Recognize the canonical steps in the analytics data pipeline and learn how to use Azure Data Factory to orchestrate them
  • Implement data lakes and lambda architectures, using Azure Data Lake Store, Data Lake Analytics, HDInsight (including Spark), Stream Analytics, SQL Data Warehouse, and Event Hubs
  • Understand where Azure Machine Learning fits into your analytics pipeline
  • Gain experience using these services on real-world data that has real-world problems, with scenarios ranging from aviation to Internet of Things (IoT)

Table of Contents

  1. Foreword
  2. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Safari
    4. How to Contact Us
    5. Acknowledgments
  3. 1. Enterprise Analytics Fundamentals
    1. The Analytics Data Pipeline
    2. Data Lakes
    3. Lambda Architecture
    4. Kappa Architecture
    5. Choosing Between Lambda and Kappa
    6. The Azure Analytics Pipeline
    7. Introducing the Analytics Scenarios
    8. Example Code and Example Data Sets
    9. What You Will Need
      1. Broadband Internet Connectivity
      2. Azure Subscription
      3. Visual Studio 2015 with Update 1
      4. Azure SDK 2.8 or Later
    10. Summary
  4. 2. Getting Data into Azure
    1. Ingest Loading Layer
    2. Bulk Data Loading
      1. Disk Shipping
      2. End User Tools
      3. Network-Oriented Approaches
    3. Stream Loading
      1. Stream Loading with Event Hubs
    4. Summary
  5. 3. Storing Ingested Data in Azure
    1. File-Oriented Storage
      1. Blob Storage
      2. Azure Data Lake Store
      3. HDFS
    2. Queue-Oriented Storage
      1. Blue Yonder Scenario: Smart Buildings
      2. Event Hubs
      3. IoT Hub
    3. Summary
  6. 4. Real-Time Processing in Azure
    1. Stream Processing
      1. Consuming Messages from Event Hubs
    2. Tuple-at-a-Time Processing in Azure
      1. Introducing HDInsight
      2. Storm on HDInsight
      3. EventProcessorHost
      4. Azure Machine Learning
    3. Summary
  7. 5. Real-Time Micro-Batch Processing in Azure
    1. Micro-Batch Processing in Azure
      1. Spark Streaming on HDInsight
      2. Storm on HDInsight
      3. Azure Stream Analytics
    2. Summary
  8. 6. Batch Processing in Azure
    1. Batch Processing with MapReduce on HDInsight
      1. Apache Hadoop MapReduce
    2. Batch Processing with Hive on HDInsight
      1. Internal and External Tables
      2. Partitioning Tables
      3. Views
      4. Indexes
      5. Databases
      6. Using Hive on HDInsight
      7. Storage on HDInsight
      8. Batch Processing Blue Yonder Airports Data
      9. Creating an External Table
      10. Creating an Internal Table
    3. Batch Processing with Pig on HDInsight
    4. Batch Processing with Spark on HDInsight
      1. Batch Processing Blue Yonder Airports Data
      2. Creating an External Table
    5. Batch Processing with SQL Data Warehouse
      1. Using SQL Data Warehouse
      2. Batch Processing Blue Yonder Airports Data
      3. Storing the Credentials to Azure Storage
    6. Batch Processing with Data Lake Analytics
      1. Using Data Lake Analytics
      2. Batch Processing Blue Yonder Airports Data
      3. Processing with U-SQL
    7. Batch Processing with Azure Batch
    8. Orchestrating Batch Processing Pipelines with Azure Data Factory
    9. Summary
  9. 7. Interactive Querying in Azure
    1. Interactive Querying with Azure SQL Data Warehouse
      1. Partitions and Distributions
      2. Indexes
      3. Interactive Exploration of the Blue Yonder Airports Data
    2. Interactive Querying with Hive and Tez
      1. Indexes
      2. Partitions
      3. Interactive Exploration of the Blue Yonder Airports Data
    3. Interactive Querying with Spark SQL
      1. Indexes
      2. Partitions
      3. Interactive Exploration of the Blue Yonder Airports Data
    4. Interactive Querying with USQL
      1. Interactive Exploration of the Blue Yonder Airports Data
    5. Summary
  10. 8. Hot and Cold Path Serving Layer in Azure
    1. Azure Redis Cache
      1. Redis in the Speed Serving Layer
    2. Document DB
      1. Document DB in the Speed Serving Layer
      2. Document DB in the Batch Serving Layer
    3. SQL Database
      1. SQL Database in the Speed Serving Layer
      2. SQL Database in the Batch Serving Layer
    4. SQL Data Warehouse
    5. HBase on HDInsight
    6. Azure Search
    7. Summary
  11. 9. Intelligence and Machine Learning
    1. Azure Machine Learning
    2. R Server on HDInsight
    3. SQL R Services
    4. Microsoft Cognitive Services
    5. Summary
  12. 10. Managing Metadata in Azure
    1. Managing Metadata with Azure Data Catalog
      1. Data Catalog in the Blue Yonder Airports Scenario
      2. Add an Azure Data Lake Store Asset
      3. Add Azure Storage Blobs
      4. Add a SQL Data Warehouse
    2. Summary
  13. 11. Protecting Your Data in Azure
    1. Identity and Access Management
    2. Data Protection
    3. Auditing
    4. Summary
  14. 12. Performing Analytics
    1. Analytics with Power BI
      1. Real-Time Power BI in the Blue Yonder Scenario
    2. Batch Analytics Reporting with Power BI in the Blue Yonder Scenario
    3. A Look Ahead
      1. Real Time
      2. Lower Batch Latencies
      3. IoT
      4. Security
      5. More Linux
  15. Index