O'Reilly logo

Pentaho Data Integration Cookbook Second Edition by María Carina Roldán, Adrián Sergio Pulvirenti, Alex Meadows

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data profiling with DataCleaner

Data profiling is an often overlooked process due to time or resource constraints on projects that, in reality, can save time and catch issues before they occur in your data integration code. For instance, finding data that doesn't match expected formats or fit within ranges, misspellings, improperly formatted dates, or discovering strings in an expected numerical field can all break a transformation.

DataCleaner is an open source data profiling tool that integrates with Kettle and can profile data while code is in the process of being developed. Additionally, DataCleaner jobs can be integrated into Kettle jobs and run as part of larger processes.

Profiling data shows the meta-information about the data being processed—from ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required