Fixing spelling mistakes

When gathering human-provided data, spelling mistakes may sneak in. This recipe will correct a misspelled word using Peter Norvig's simple heuristic spellchecker described at http://norvig.com/spell-correct.html.

This recipe is just one approach to a very difficult problem in machine learning. We can use it as a starting point or as an influence to implement a more powerful solution with better results.

Getting ready

Refer to Norvig's spell-correction Python algorithm located at http://norvig.com/spell-correct.html.

The core algorithm works as follows:

  • Transform raw text into lowercase alphabetical words
  • Compute a frequency map of all the words
  • Define functions to produce all strings within an edit distance of one or two
  • Find ...

Get Haskell Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.