images

Effective Big Data ETL with SSIS, Pig, and Sqoop

WHAT YOU WILL LEARN IN THIS CHAPTER:

  • Moving Data Between SQL Server and Hadoop
  • Using SSIS to Integrate
  • Using Sqoop for Importing and Exporting
  • Using Pig to Transform Data
  • Choosing the Right Tool

A number of tools are available to help you move data between your Hadoop environment and SQL Server. This chapter covers three common ones: SQL Server Integration Services, Sqoop, and Pig.

SQL Server Integration Services (SSIS) is used in many SQL Server environments to import, export, and transform data. It can integrate with many different data systems, not just SQL Server, and supports a number of built-in transformations. In addition, you can extend it using custom transformations to support any transformations not supported “out of the box.” This extensibility enables it to work with Hive as both a source of data and as a destination.

Sqoop is a tool designed to handle moving data between Hadoop and relational databases. Although it doesn't support a full range of transformation capabilities like SSIS, it is easy and quick to set up and use.

Pig enables users to analyze large data sets. It supports a number of built-in transformations for the data, and additional transformations can be added as user-defined functions through custom coding. It was originally developed as a way to reduce the complexity of writing MapReduce jobs, but ...

Get Microsoft Big Data Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.