Introduction

More than 50 years ago the first computers for general use emerged, and we saw a gradually increasing adoption of their use by the scientific and business world. In those early days, most organizations had just one computer with a single display and printer attached to it, so the need for integrating data stored in different systems simply didn't exist. This changed when in the late 1970s the relational database made inroads into the corporate world. The 1980s saw a further proliferation of both computers and databases, all holding different bits and pieces of an organization's total collection of information. Ultimately, this led to the start of a whole new industry, which was sparked by IBM researchers Dr. Barry Devlin and Paul Murphy in their seminal paper "An architecture for a business and information system" (first published in 1988 in IBM Systems Journal, Volume 27, Number 1). The concept of a business data warehouse was introduced for the first time as being "the single logical storehouse of all the information used to report on the business." Less than five years later, Bill Inmon published his landmark book, Building the Data Warehouse, which further popularized the concepts and technologies needed to build this "logical storehouse."

One of the core themes in all data warehouse–related literature is the concept of integrating data. The term data integration refers to the process of combining data from different sources to provide a single comprehensible view ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.