Book description
Getting familiar with Talend Open Studio will greatly enhance your data handling and integration capabilities. This is the perfect reference book for beginners and intermediates with a host of practical recipes that clarify even complex features.
- A collection of exercises covering all development aspects including schemas, mapping using tMap, database and working with files
- Get your code ready for the production environment by including the use of contexts and scheduling of jobs in Talend
- Includes exercises for debugging and testing of code
- Many additional hints and tips regarding the exercises and their real-life applications
In Detail
Data integration is a key component of an organization’s technical strategy, yet historically the tools have been very expensive. Talend Open Studio is the world’s leading open source data integration product and has played a huge part in making open source data integration a popular choice for businesses worldwide.
This book is a welcome addition to the small but growing library of Talend Open Studio resources. From working with schemas to creating and validating test data, to scheduling your Talend code, you will get acquainted with the various Talend database handling techniques. Each recipe is designed to provide the key learning point in a short, simple and effective manner.
This comprehensive guide provides practical exercises that cover all areas of the Talend development lifecycle including development, testing, debugging and deployment. The book delivers design patterns, hints, tips, and advice in a series of short and focused exercises that can be approached as a reference for more seasoned developers or as a series of useful learning tutorials for the beginner.
The book covers the basics in terms of schema usage and mappings, along with dedicated sections that will allow you to get more from tMap, files, databases and XML.
Geared towards the whole lifecycle, the Talend Open Studio Cookbook shows readers great ways to handle everyday tasks, and provides an insight into all areas of a development cycle including coding, testing, and debugging of code to provide start-to-finish coverage of the product.
Table of contents
-
Talend Open Studio Cookbook
- Table of Contents
- Talend Open Studio Cookbook
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Preface
- 1. Introduction and General Principles
- 2. Metadata and Schemas
-
3. Validating Data
- Introduction
- Enabling and disabling reject flows
- Gathering all rejects prior to killing a job
- Validating against the schema
- Rejecting rows using tMap
- Checking a column against a list of allowed values
- Checking a column against a lookup
- Creating validation rules for more complex requirements
- Creating binary error codes to store multiple test results
-
4. Mapping Data
- Introduction
- Simple mapping and tMap time savers
- Creating tMap expressions
- Using the ternary operator for conditional logic
- Using intermediate variables in tMap
- Filtering input rows
- Splitting an input row into multiple outputs based on input conditions
- Joining data using tMap
- Hierarchical joins using tMap
- Using reload at each row to process real-time / near real-time data
-
5. Using Java in Talend
- Introduction
- Performing one-off pieces of logic using tJava
- Setting the context and globalMap variables using tJava
- Adding complex logic into a flow using tJavaRow
- Creating pseudo components using tJavaFlex
- Creating custom functions using code routines
- Importing JAR files to allow use of external Java classes
-
6. Managing Context Variables
- Introduction
- Creating a context group
- Adding a context group to your job
- Adding contexts to a context group
- Using tContextLoad to load contexts
- Using implicit context loading to load contexts
- Turning implicit context loading on and off in a job
- Setting the context file location in the operating system
-
7. Working with Databases
- Introduction
- Setting up a database connection
- Importing the table schemas
- Reading from database tables
- Using context and globalMap variables in SQL queries
- Printing your input query
- Writing to a database table
- Printing your output query
- Managing database sessions
- Passing a session to a child job
- Selecting different fields and keys for insert, update, and delete
- Capturing individual rejects and errors
- Database and table management
- Managing surrogate keys for parent and child tables
- Rewritable lookups using an in-process database
-
8. Managing Files
- Introduction
- Appending records to a file
- Reading rows using a regular expression
- Using temporary files
- Storing intermediate data in the memory using tHashMap
- Reading headers and trailers using tMap
- Reading headers and trailers with no identifiers
- Using the information in the header and trailer
- Adding a header and trailer to a file
- Moving, copying, renaming, and deleting files and folders
- Capturing file information
- Processing multiple files at once
- Processing control/validation files
- Creating and writing files depending on the input data
- 9. Working with XML, Queues, and Web Services
-
10. Debugging, Logging, and Testing
- Introduction
- Find the location of compilation errors using the Problems tab
- Locating execution errors from the console output
- Using the Talend debug mode – row-by-row execution
- Using the Java debugger to debug Talend jobs
- Using tLogRow to show data in a row
- Using tJavaRow to display row information
- Using tJava to display status messages and variables
- Printing out the context
- Dumping the console output to a file from within a job
- Creating simple test data using tRowGenerator
- Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
- Creating random test data using lookups
- Creating test data using Excel
- Testing logic – the most-used pattern
- Killing a job from within tJavaRow
-
11. Deploying and Scheduling Talend Code
- Introduction
- Creating compiled executables
- Using a different context
- Adding command-line context parameters
- Managing job dependencies
- Capturing and acting on different return codes
- Returning codes from a child job without tDie
- Passing parameters to a child job
- Executing non-Talend objects and operating system commands
-
12. Common Mistakes and Other Useful Hints and Tips
- Introduction
- My tab is missing
- Finding the code routine
- Finding a new context variable
- Reloads going missing at each row global variable
- Dragging component globalMap variables
- Some complex date formats
- Capturing tMap rejects
- Adding job name, project name, and other job specific information
- Printing tMap variables
- Stopping memory errors in Talend
- A. Common Type Conversions
- B. Management of Contexts
- Index
Product information
- Title: Talend Open Studio Cookbook
- Author(s):
- Release date: October 2013
- Publisher(s): Packt Publishing
- ISBN: 9781782167266
You might also like
book
Getting Started with Talend Open Studio for Data Integration
This is the complete course for anybody who wants to get to grips with Talend Open …
book
MySQL Cookbook, 2nd Edition
Along with MySQL's popularity has come a flood of questions about solving specific problems, and that's …
book
Java Cookbook, 4th Edition
Java continues to grow and evolve, and this cookbook continues to evolve in tandem. With this …
book
Oracle GoldenGate 11g Complete Cookbook
Dig deep into administering Oracle Goldengate 11g using this comprehensive cookbook. From the very basics of …