Appendix A

Some Commonly Used ER Comparators

Exact Match and Standardization

The most fundamental and most commonly used similarity is an exact match between two values. Exact match requires both values to be the same according to their data type, i.e. the same numeric value, the same date, or the same string.
In the case of string values, exact match is a demanding requirement. Two strings will only be exactly the same if they comprise exactly the same characters (by internal character code) in exactly the same order. For example, while “JOHN” and “JOHN” are exact string matches, the strings “JOHN” and “John” are not. The character code for an upper case “O” is different from the character code for a lower case “o”. The same is true for the letters ...

Get Entity Information Life Cycle for Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.