Wrapping Up

We’ve now used language processing techniques for several practical tasks. We’ve fetched web pages and extracted the body text from them. We’ve used term extraction to pull keywords from within the text, summarizing its contents.

We created a simple search for this index of keywords, then extended it in two ways to make it easier to use and more forgiving: first by using edit distance to allow for typos and misspellings, and then by matching terms that sounded like the user’s query—allowing them to search for things that they couldn’t even spell.

We’ve barely scratched the surface of language processing, but hopefully you’ve seen what you can achieve by combining tried and true algorithms and using well-established primitives to perform ...

Get Text Processing with Ruby now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.