When the data warehouse design is stabilized, a process must be designed to fill the data warehouse with data. We use the general term data integration to describe the collection of activities that result in or contribute to filling of the data warehouse. Pentaho offers a collection of tools collectively known as Pentaho Data Integration that are designed to support this task.
This chapter provides some background information about the data integration process in general. We provide an overview of the Pentaho Data Integration tools, and explain in detail how these tools can be used to design and create a data integration solution to load the data warehouse.
In a general sense, the word "integration" denotes a process that forms a whole out of multiple parts. The term "data integration" is usually understood as the process that combines data from different sources to provide a single comprehensible view on all of the combined data. A typical example of data integration would be combining the data from a warehouse inventory system with that of the order entry system to allow order fulfilment to be directly related to changes in the inventory. Another example of data integration is merging customer and contact data from separate departmental customer relationship management systems into a corporate customer relationship management system.
In the introduction to this chapter we stated that data integration comprises those ...