You are previewing Talend for Big Data.
O'Reilly logo
Talend for Big Data

Book Description

If you want to start working on big data projects fast, this is the guide you’ve been looking for. Delve deep into Talend and discover how just how easily you can revolutionize your data handling and presentation.

In Detail

Talend, a successful Open Source Data Integration Solution, accelerates the adoption of new big data technologies and efficiently integrates them into your existing IT infrastructure. It is able to do this because of its intuitive graphical language, its multiple connectors to the Hadoop ecosystem, and its array of tools for data integration, quality, management, and governance.

This is a concise, pragmatic book that will guide you through design and implement big data transfer easily and perform big data analytics jobs using Hadoop technologies like HDFS, HBase, Hive, Pig, and Sqoop. You will see and learn how to write complex processing job codes and how to leverage the power of Hadoop projects through the design of graphical Talend jobs using business modeler, meta-data repository, and a palette of configurable components.

Starting with understanding how to process a large amount of data using Talend big data components, you will then learn how to write job procedures in HDFS. You will then look at how to use Hadoop projects to process data and how to export the data to your favourite relational database system.

You will learn how to implement Hive ELT jobs, Pig aggregation and filtering jobs, and simple Sqoop jobs using the Talend big data component palette. You will also learn the basics of Twitter sentiment analysis the instructions to format data with Apache Hive.

Talend for Big Data will enable you to start working on big data projects immediately, from simple processing projects to complex projects using common big data patterns.

What You Will Learn

  • Discover the structure of the Talend Unified Platform
  • Work with Talend HDFS components
  • Implement ELT processing jobs using Talend Hive components
  • Load, filter, aggregate, and store data using Talend Pig components
  • Integrate HDFS with RDBMS using Sqoop components
  • Use the streaming pattern for big data
  • Learn to reuse the partitioning pattern for Big Data
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Talend for Big Data
      1. Table of Contents
      2. Talend for Big Data
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the color images of this book
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Getting Started with Talend Big Data
        1. Talend Unified Platform presentation
        2. Knowing about the Hadoop ecosystem
        3. Prerequisites for running examples
        4. Downloading Talend Open Studio for Big Data
        5. Installing TOSBD
        6. Running TOSBD for the first time
        7. Summary
      9. 2. Building Our First Big Data Job
        1. TOSBD – the development environment
        2. A simple HDFS writer job
        3. Checking the result in HDFS
        4. Summary
      10. 3. Formatting Data
        1. Twitter Sentiment Analysis
        2. Writing the tweets in HDFS
        3. Setting our Apache Hive tables
        4. Formatting tweets with Apache Hive
        5. Summary
      11. 4. Processing Tweets with Apache Hive
        1. Extracting hashtags
        2. Extracting emoticons
        3. Joining the dots
        4. Summary
      12. 5. Aggregate Data with Apache Pig
        1. Knowing about Pig
        2. Extracting the top Twitter users
        3. Extracting the top hashtags, emoticons, and sentiments
        4. Summary
      13. 6. Back to the SQL Database
        1. Linking HDFS and RDBMS with Sqoop
        2. Exporting and importing data to a MySQL database
        3. Summary
      14. 7. Big Data Architecture and Integration Patterns
        1. The streaming pattern
        2. The partitioning pattern
        3. Summary
      15. A. Installing Your Hadoop Cluster with Cloudera CDH VM
        1. Downloading Cloudera CDH VM
        2. Launching the VM for the first time
        3. Basic required configuration
        4. Summary
      16. Index