Lancaster stemming

A very aggressive stemming algorithm, sometimes to a fault. With porter and snowball, the stemmed representations are usually fairly intuitive to a reader, not so with Lancaster, as many shorter words will become totally obfuscated. The fastest algorithm here, it will greatly reduce your working set of words, but if you want more distinction, this is not the tool to use. The Lancaster rule example is given in the following block:

ies -> y - This rule converts ies suffix of the word into y.cries -> crySo with Lancaster stemming as we see cries stemmed into cry which more better stemmed.

Get Artificial Intelligence for Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.