Getting ready

Nearest-neighbor is a great algorithm to use for address matching. Address matching is a type of record matching in which we have addresses in multiple datasets and would like to match them up. In address matching, we may have typos in the address, different cities, or different ZIP Codes, but they may all refer to the same address. Using the nearest-neighbor algorithm across the numerical and character components of an address may help us to identify addresses that are actually the same.

In this example, we will generate two datasets. Each dataset will comprise a street address and a ZIP Code. But one dataset has a high number of typos in the street address. We will take the non-typo dataset as our gold standard, and will return ...

Get TensorFlow Machine Learning Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.