You are previewing SAP Data Services 4.x Cookbook.
O'Reilly logo
SAP Data Services 4.x Cookbook

Book Description

Delve into the SAP Data Services environment to efficiently prepare, implement, and develop ETL processes

About This Book

  • Install and configure the SAP Data Services environment

  • Develop ETL techniques in the Data Services environment

  • Implement real-life examples of Data Services uses through step-by-step instructions to perform specific ETL development tasks

  • Who This Book Is For

    This book is for IT technical engineers who want to get familiar with the EIM solutions provided by SAP for ETL development and data quality management. The book requires familiarity with basic programming concepts and basic knowledge of the SQL language.

    What You Will Learn

  • Install, configure, and administer the SAP Data Services components

  • Run through the ETL design basics

  • Maximize the performance of your ETL with the advanced patterns in Data Services

  • Extract methods from various databases and systems

  • Get familiar with the transformation methods available in SAP Data Services

  • Load methods into various databases and systems

  • Code with the Data Services scripting language

  • Validate and cleanse your data, applying the Data quality methods of the Information Steward

  • In Detail

    Want to cost effectively deliver trusted information to all of your crucial business functions? SAP Data Services delivers one enterprise-class solution for data integration, data quality, data profiling, and text data processing. It boosts productivity with a single solution for data quality and data integration. SAP Data Services also enables you to move, improve, govern, and unlock big data.

    This book will lead you through the SAP Data Services environment to efficiently develop ETL processes. To begin with, you’ll learn to install, configure, and prepare the ETL development environment. You will get familiarized with the concepts of developing ETL processes with SAP Data Services. Starting from smallest unit of work- the data flow, the chapters will lead you to the highest organizational unit—the Data Services job, revealing the advanced techniques of ETL design.

    You will learn to import XML files by creating and implementing real-time jobs. It will then guide you through the ETL development patterns that enable the most effective performance when extracting, transforming, and loading data. You will also find out how to create validation functions and transforms.

    Finally, the book will show you the benefits of data quality management with the help of another SAP solution—Information Steward.

    Style and approach

    This book is an easy-to-follow guide with step-by-step instructions to perform specific ETL development tasks.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

    Table of Contents

    1. SAP Data Services 4.x Cookbook
      1. Table of Contents
      2. SAP Data Services 4.x Cookbook
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
          3. Instant updates on new Packt books
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Sections
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Conventions
        6. Reader feedback
        7. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. Introduction to ETL Development
        1. Introduction
        2. Preparing a database environment
          1. Getting ready
          2. How to do it…
          3. How it works…
        3. Creating a source system database
          1. How to do it…
          2. How it works…
          3. There's more…
        4. Defining and creating staging area structures
          1. How to do it…
            1. Flat files
            2. RDBMS tables
          2. How it works…
        5. Creating a target data warehouse
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      9. 2. Configuring the Data Services Environment
        1. Introduction
        2. Creating IPS and Data Services repositories
          1. Getting ready…
          2. How to do it…
          3. How it works…
          4. See also
        3. Installing and configuring Information Platform Services
          1. Getting ready…
          2. How to do it…
          3. How it works…
        4. Installing and configuring Data Services
          1. Getting ready…
          2. How to do it…
          3. How it works…
        5. Configuring user access
          1. Getting ready…
          2. How to do it…
          3. How it works…
        6. Starting and stopping services
          1. How to do it…
          2. How it works…
          3. See also
        7. Administering tasks
          1. How to do it…
          2. How it works…
          3. See also
        8. Understanding the Designer tool
          1. Getting ready…
          2. How to do it…
          3. How it works…
            1. Executing ETL code in Data Services
            2. Validating ETL code
            3. Template tables
            4. Query transform basics
            5. The HelloWorld example
      10. 3. Data Services Basics – Data Types, Scripting Language, and Functions
        1. Introduction
        2. Creating variables and parameters
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Creating a script
          1. How to do it…
          2. How it works…
        4. Using string functions
          1. How to do it…
            1. Using string functions in the script
          2. How it works…
          3. There's more…
        5. Using date functions
          1. How to do it…
            1. Generating current date and time
            2. Extracting parts from dates
          2. How it works…
          3. There's more…
        6. Using conversion functions
          1. How to do it…
          2. How it works…
          3. There's more…
        7. Using database functions
          1. How to do it…
            1. key_generation()
            2. total_rows()
            3. sql()
          2. How it works…
        8. Using aggregate functions
          1. How to do it…
          2. How it works…
        9. Using math functions
          1. How to do it…
          2. How it works…
          3. There's more…
        10. Using miscellaneous functions
          1. How to do it…
          2. How it works…
        11. Creating custom functions
          1. How to do it…
          2. How it works…
          3. There's more…
      11. 4. Dataflow – Extract, Transform, and Load
        1. Introduction
        2. Creating a source data object
          1. How to do it…
          2. How it works…
          3. There's more…
        3. Creating a target data object
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Loading data into a flat file
          1. How to do it…
          2. How it works…
          3. There's more…
        5. Loading data from a flat file
          1. How to do it…
          2. How it works…
          3. There's more…
        6. Loading data from table to table – lookups and joins
          1. How to do it…
          2. How it works…
        7. Using the Map_Operation transform
          1. How to do it…
          2. How it works…
        8. Using the Table_Comparison transform
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Exploring the Auto correct load option
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Splitting the flow of data with the Case transform
          1. Getting ready
          2. How to do it…
          3. How it works…
        11. Monitoring and analyzing dataflow execution
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      12. 5. Workflow – Controlling Execution Order
        1. Introduction
        2. Creating a workflow object
          1. How to do it…
          2. How it works…
        3. Nesting workflows to control the execution order
          1. Getting ready
          2. How to do it
          3. How it works…
        4. Using conditional and while loop objects to control the execution order
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There is more…
        5. Using the bypassing feature
          1. Getting ready…
          2. How to do it…
          3. How it works…
          4. There is more…
        6. Controlling failures – try-catch objects
          1. How to do it…
          2. How it works…
        7. Use case example – populating dimension tables
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. Mapping
              1. Dependencies
              2. Development
              3. Execution order
              4. Testing ETL
                1. Preparing test data to populate DimSalesTerritory
                2. Preparing test data to populate DimGeography
            2. Using a continuous workflow
              1. How to do it…
              2. How it works…
              3. There is more…
            3. Peeking inside the repository – parent-child relationships between Data Services objects
              1. Getting ready
              2. How to do it…
              3. How it works…
                1. Get a list of object types and their codes in the Data Services repository
                2. Display information about the DF_Transform_DimGeography dataflow
                3. Display information about the SalesTerritory table object
                4. See the contents of the script object
      13. 6. Job – Building the ETL Architecture
        1. Introduction
        2. Projects and jobs – organizing ETL
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. Hierarchical object view
            2. History execution log files
            3. Executing/scheduling jobs from the Management Console
        3. Using object replication
          1. How to do it…
          2. How it works…
        4. Migrating ETL code through the central repository
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. Adding objects to and from the Central Object Library
            2. Comparing objects between the Local and Central repositories
          4. There is more…
        5. Migrating ETL code with export/import
          1. Getting ready
          2. How to do it…
            1. Import/Export using ATL files
            2. Direct export to another local repository
          3. How it works…
        6. Debugging job execution
          1. Getting ready…
          2. How to do it…
          3. How it works…
        7. Monitoring job execution
          1. Getting ready
          2. How to do it…
          3. How it works…
        8. Building an external ETL audit and audit reporting
          1. Getting ready…
          2. How to do it…
          3. How it works…
        9. Using built-in Data Services ETL audit and reporting functionality
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Auto Documentation in Data Services
          1. How to do it…
          2. How it works…
      14. 7. Validating and Cleansing Data
        1. Introduction
        2. Creating validation functions
          1. Getting ready
          2. How to do it…
          3. How it works…
        3. Using validation functions with the Validation transform
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Reporting data validation results
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Using regular expression support to validate data
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Enabling dataflow audit
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Data Quality transforms – cleansing your data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      15. 8. Optimizing ETL Performance
        1. Introduction
        2. Optimizing dataflow execution – push-down techniques
          1. Getting ready
          2. How to do it…
          3. How it works…
        3. Optimizing dataflow execution – the SQL transform
          1. How to do it…
          2. How it works…
        4. Optimizing dataflow execution – the Data_Transfer transform
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. Why we used a second Data_Transfer transform object
            2. When to use Data_Transfer transform
          4. There's more…
        5. Optimizing dataflow readers – lookup methods
          1. Getting ready
          2. How to do it…
            1. Lookup with the Query transform join
            2. Lookup with the lookup_ext() function
            3. Lookup with the sql() function
          3. How it works…
            1. Query transform joins
            2. lookup_ext()
            3. sql()
            4. Performance review
        6. Optimizing dataflow loaders – bulk-loading methods
          1. How to do it…
          2. How it works…
            1. When to enable bulk loading?
        7. Optimizing dataflow execution – performance options
          1. Getting ready
          2. How to do it…
            1. Dataflow performance options
            2. Source table performance options
            3. Query transform performance options
            4. lookup_ext() performance options
            5. Target table performance options
      16. 9. Advanced Design Techniques
        1. Introduction
        2. Change Data Capture techniques
          1. Getting ready
            1. No history SCD (Type 1)
            2. Limited history SCD (Type 3)
            3. Unlimited history SCD (Type 2)
          2. How to do it…
          3. How it works…
            1. Source-based ETL CDC
            2. Target-based ETL CDC
            3. Native CDC
        3. Automatic job recovery in Data Services
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Simplifying ETL execution with system configurations
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Transforming data with the Pivot transform
          1. Getting ready
          2. How to do it…
          3. How it works…
      17. 10. Developing Real-time Jobs
        1. Introduction
        2. Working with nested structures
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There is more…
        3. The XML_Map transform
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. The Hierarchy_Flattening transform
          1. Getting ready
          2. How to do it…
            1. Horizontal hierarchy flattening
            2. Vertical hierarchy flattening
          3. How it works…
            1. Querying result tables
        5. Configuring Access Server
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Creating real-time jobs
          1. Getting ready
            1. Installing SoapUI
          2. How to do it…
          3. How it works…
      18. 11. Working with SAP Applications
        1. Introduction
        2. Loading data into SAP ERP
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. IDoc
            2. Monitoring IDoc load on the SAP side
            3. Post-load validation of loaded data
          4. There is more…
      19. 12. Introduction to Information Steward
        1. Introduction
        2. Exploring Data Insight capabilities
          1. Getting ready
          2. How to do it…
            1. Creating a connection object
            2. Profiling the data
            3. Viewing profiling results
            4. Creating a validation rule
            5. Creating a scorecard
          3. How it works…
            1. Profiling
            2. Rules
            3. Scorecards
          4. There is more…
        3. Performing Metadata Management tasks
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Working with the Metapedia functionality
          1. How to do it…
          2. How it works…
        5. Creating a custom cleansing package with Cleansing Package Builder
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There is more…
      20. Index