You are previewing Microsoft SQL Server 2012 with Hadoop.
O'Reilly logo
Microsoft SQL Server 2012 with Hadoop

Book Description

Getting SQL Server talking to Hadoop is a smooth process when you follow this tutorial. Learn all the tools and techniques you need integrate the data and then extract powerful business insights from the merged result.

  • Integrate data from unstructured (Hadoop) and structured (SQL Server 2012) sources

  • Configure and install connectors for a bi-directional transfer of data

  • Full of illustrations, diagrams, and tips with clear, step-by-step instructions and practical examples

  • In Detail

    With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasingly important because the two are indeed complementary. While petabytes of unstructured data can be stored in Hadoop taking hours to be queried, terabytes of structured data can be stored in SQL Server 2012 and queried in seconds. This leads to the need to transfer and integrate data between Hadoop and SQL Server.

    Microsoft SQL Server 2012 with Hadoop is aimed at SQL Server developers. It will quickly show you how to get Hadoop activated on SQL Server 2012 (it ships with this version). Once this is done, the book will focus on how to manage big data with Hadoop and use Hadoop Hive to query the data. It will also cover topics such as using in-memory functions by SQL Server and using tools for BI with big data.

    Microsoft SQL Server 2012 with Hadoop focuses on data integration techniques between relational (SQL Server 2012) and non-relational (Hadoop) worlds. It will walk you through different tools for the bi-directional movement of data with practical examples.

    You will learn to use open source connectors like SQOOP to import and export data between SQL Server 2012 and Hadoop, and to work with leading in-memory BI tools to create ETL solutions using the Hive ODBC driver for developing your data movement projects. Finally, this book will give you a glimpse of the present day self-service BI tools such as Excel and PowerView to consume Hadoop data and provide powerful insights on the data.

    Table of Contents

    1. Microsoft SQL Server 2012 with Hadoop
      1. Table of Contents
      2. Microsoft SQL Server 2012 with Hadoop
      3. Credits
      4. About the Author
      5. About the Reviewer
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
          3. Instant Updates on New Packt Books
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Errata
          2. Piracy
          3. Questions
      8. 1. Introduction to Big Data and Hadoop
        1. Big Data – what's the big deal?
        2. The Apache Hadoop framework
          1. HDFS
          2. MapReduce
            1. NameNode
            2. Secondary NameNode
            3. DataNode
            4. JobTracker
            5. TaskTracker
          3. Hive
          4. Pig
          5. Flume
          6. Sqoop
          7. Oozie
          8. HBase
          9. Mahout
        3. Summary
      9. 2. Using Sqoop – The SQL Server Hadoop Connector
        1. The SQL Server-Hadoop Connector
          1. Installation prerequisites
            1. A Hadoop cluster on Linux
            2. Installing and configuring Sqoop
            3. Setting up the Microsoft JDBC driver
        2. Downloading the SQL Server-Hadoop Connector
        3. Installing the SQL Server-Hadoop Connector
        4. The Sqoop import tool
          1. Importing the tables in Hive
        5. The Sqoop export tool
          1. Data types
        6. Summary
      10. 3. Using the Hive ODBC Driver
        1. The Hive ODBC Driver
        2. SQL Server Integration Services (SSIS)
          1. SSIS as an ETL – extract, transform, and load tool
        3. Developing the package
          1. Creating the project
          2. Creating the Data Flow
          3. Creating the source Hive connection
          4. Creating the destination SQL connection
          5. Creating the Hive source component
          6. Creating the SQL destination component
          7. Mapping the columns
          8. Running the package
        4. Summary
      11. 4. Creating a Data Model with SQL Server Analysis Services
        1. Configuring the SQL Linked Server to Hive
          1. The Linked Server script
          2. Using OpenQuery
          3. Creating a view
        2. Creating an SSAS data model
        3. Summary
      12. 5. Using Microsoft's Self-Service Business Intelligence Tools
        1. PowerPivot enhancements
        2. Power View for Excel
        3. Summary
      13. Index