Chapter 7. Development

"By failing to prepare, you are preparing to fail." Benjamin Franklin

If you have been reading this book in sequence, you now have a detailed model of the challenges you face building your ETL system. We have described the data structures you need (Chapter 2), the range of sources you must connect to (Chapter 3), a comprehensive architecture for cleaning and conforming the data (Chapter 4), and all the target dimension tables and fact tables that constitute your final delivery (Chapters 5 and 6). We certainly hope that you can pick and choose a subset of all this for your ETL system!

Hopefully, you are at the point where you can draw a process-flow diagram for your proposed ETL system that clearly identifies at a reasonable level of detail the extracting, cleaning, conforming, and delivering modules.

Now it's time to decide what your ETL system development platform is and how to go about the development. If you have the luxury of starting fresh, you have a big fork in the road: Either purchase a professional ETL tool suite, or plan on rolling your own with a combination of programming and scripting languages. We tried to give you an even handed assessment of this choice in Chapter 1. Maybe you should go back and read that again.

Note

PROCESS CHECK Planning & Design:

Requirements/Realities → Architecture → Implementation → Test/Release

Data Flow : Extract → Clean → Conform → Deliver

In the next section, we give you a brief listing of the main ETL tool suites, data-proofing ...

Get The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.