Chapter 20. Identity Mapping and De-Duplicating

Exam objectives in this chapter:

  • Extract and Transform Data

    • Design data flow.

    • Implement data flow.

  • Build Data Quality Solutions

    • Create a data quality project to clean data.

Two of the most challenging problems with maintaining master data are identity mapping and de-duplication.

In an enterprise, you frequently have more than one source of master data. Sometimes you have to import master data from outer sources. Different sources can include older applications; relational databases used by OLTP applications; analytical databases; semi-structured data, such as XML data from a web service; and even non-structured data in Microsoft Excel worksheets and Microsoft Word documents. Typically, you do not have unique ...

Get Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.