You are previewing Microsoft Big Data Solutions.
O'Reilly logo
Microsoft Big Data Solutions

Book Description

Tap the power of Big Data with Microsoft technologies

Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies.

Best of all, it helps you integrate these new solutions with technologies you already know, such as SQL Server and Hadoop.

  • Walks you through how to integrate Big Data solutions in your company using Microsoft's HDInsight Server, HortonWorks Data Platform for Windows, and open source tools

  • Explores both on-premises and cloud-based solutions

  • Shows how to store, manage, analyze, and share Big Data through the enterprise

  • Covers topics such as Microsoft's approach to Big Data, installing and configuring HortonWorks Data Platform for Windows, integrating Big Data with SQL Server, visualizing data with Microsoft and HortonWorks BI tools, and more

  • Helps you build and execute a Big Data plan

  • Includes contributions from the Microsoft and HortonWorks Big Data product teams

  • If you need a detailed roadmap for designing and implementing a fully deployed Big Data solution, you'll want Microsoft Big Data Solutions.

    Table of Contents

    1. Cover Page
    2. Title Page
    3. Copyright
    4. Dedication
    5. Acknowledgments
    6. About the Author
    7. About the Technical Editors
    8. Contents
    9. Introduction
      1. Our Team
      2. All Kidding Aside
      3. Who Is This Book For?
      4. What You Need to Use This Book
      5. Chapter Overview
      6. Features Used in This Book
    10. Part I: What Is Big Data?
      1. CHAPTER 1: Industry Needs and Solutions
        1. What's So Big About Big Data?
        2. A Brief History of Hadoop
        3. What Is Hadoop?
        4. Summary
      2. CHAPTER 2: Microsoft's Approach to Big Data
        1. A Story of “Better Together”
        2. Competition in the Ecosystem
        3. SQL on Hadoop Today
        4. Deploying Hadoop
        5. Summary
    11. Part II: Setting Up for Big Data with Microsoft
      1. CHAPTER 3: Configuring Your First Big Data Environment
        1. Getting Started
        2. Getting the Install
        3. Running the Installation
        4. Validating Your New Cluster
        5. Common Post-setup Tasks
        6. Summary
    12. Part III: Storing and Managing Big Data
      1. CHAPTER 4: HDFS, Hive, HBase, and HCatalog
        1. Exploring the Hadoop Distributed File System
        2. Explaining the HDFS Architecture
        3. Exploring Hive: The Hadoop Data Warehouse Platform
        4. Exploring HCatalog: HDFS Table and Metadata Management
        5. Exploring HBase: An HDFS Column-oriented Database
        6. Summary
      2. CHAPTER 5: Storing and Managing Data in HDFS
        1. Understanding the Fundamentals of HDFS
        2. Using Common Commands to Interact with HDFS
        3. Moving and Organizing Data in HDFS
        4. Summary
      3. CHAPTER 6: Adding Structure with Hive
        1. Understanding Hive's Purpose and Role
        2. Creating and Querying Basic Tables
        3. Using Advanced Data Structures with Hive
        4. Summary
      4. CHAPTER 7: Expanding Your Capability with HBase and HCatalog
        1. Using HBase
        2. Managing Data with HCatalog
        3. Creating Partitions
        4. Integrating HCatalog with Pig and Hive
        5. Using HBase or Hive as a Data Warehouse
        6. Summary
    13. Part IV: Working with Your Big Data
      1. CHAPTER 8: Effective Big Data ETL with SSIS, Pig, and Sqoop
        1. Combining Big Data and SQL Server Tools for Better Solutions
        2. Working with SSIS and Hive
        3. Configuring Your Packages
        4. Transferring Data with Sqoop
        5. Using Pig for Data Movement
        6. Choosing the Right Tool
        7. Summary
      2. CHAPTER 9: Data Research and Advanced Data Cleansing with Pig and Hive
        1. Getting to Know Pig
        2. Using Hive
        3. Summary
    14. Part V: Big Data and SQL Server Together
      1. CHAPTER 10: Data Warehouses and Hadoop Integration
        1. State of the Union
        2. Challenges Faced by Traditional Data Warehouse Architectures
        3. Hadoop's Impact on the Data Warehouse Market
        4. Introducing Parallel Data Warehouse (PDW)
        5. Project Polybase
        6. Summary
      2. CHAPTER 11: Visualizing Big Data with Microsoft BI
        1. An Ecosystem of Tools
        2. Self-service Big Data with PowerPivot
        3. Rapid Big Data Exploration with Power View
        4. Spatial Exploration with Power Map
        5. Summary
      3. CHAPTER 12: Big Data Analytics
        1. Data Science, Data Mining, and Predictive Analytics
        2. Introduction to Mahout
        3. Building a Recommendation Engine
        4. Summary
      4. CHAPTER 13: Big Data and the Cloud
        1. Defining the Cloud
        2. Exploring Big Data Cloud Providers
        3. Setting Up a Big Data Sandbox in the Cloud
        4. Storing Your Data in the Cloud
        5. Summary
      5. CHAPTER 14: Big Data in the Real World
        1. Common Industry Analytics
        2. Operational Analytics
        3. Summary
    15. Part VI: Moving Your Big Data Forward
      1. CHAPTER 15: Building and Executing Your Big Data Plan
        1. Gaining Sponsor and Stakeholder Buy-in
        2. Identifying Technical Challenges
        3. Identifying Operational Challenges
        4. Going Forward
        5. Summary
      2. CHAPTER 16: Operational Big Data Management
        1. Ongoing Data Integration with Cloud and On-premise Solutions
        2. Integration Thoughts for Big Data
        3. Backups and High Availability in Your Big Data Environment
        4. Big Data Solution Governance
        5. Creating Operational Analytics
        6. Summary
    16. Index