We now have the ability to find the correct spellings of words or mark them as similar. While processing a large corpus, we can extract all unique words and compare each token against every other token.
It would take O(n2), where n is the number of unique tokens in a corpus. This might make the process too slow for a large corpus.
The alternative is to use a standard dictionary and expand the same for your corpus. If the dictionary has m unique words, this process now will be O(m∗n). Assuming that m<<n*m<<n2, this will be much faster than the previous approach.