Chapter 14. Simil: an algorithm to look for similar strings

Tom van Stiphout

Are you a perfect speller? Is everyone in your company? How about your business partners? Misspellings are a fact of life. There are also legitimate differences in spelling: what Americans call rumors, the British call rumours. Steven A. Ballmer and Steve Ballmer are two different but accurate forms of that man’s name. Your database may contain a lot of legacy values from the days before better validation at the point of data entry.

Overall, chances are your database already contains imperfect textual data, which makes it hard to search. Additionally, the user may not know exactly what to look for. When looking for a number or a date, we could search for a range, ...

Get SQL Server MVP Deep Dives now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.