O'Reilly logo

Text Processing with Ruby by Rob Miller

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Example: Extracting Keywords from Articles

Let’s imagine that we have a list of many hundreds of articles, spread across many different web pages. We’d like to analyze the content of these articles and create a searchable database of them for ourselves. We’d like to store the content of the articles—but not any extraneous text from the web page, such as header text or sidebar content. We’d also like to make an attempt to store some keywords, so that we can search against them and not have to search the whole body of the text. When we’re finished, we’ll be able to list all the terms mentioned in an article, and by extension we’ll be able to list all the articles that match a particular term.

This problem neatly covers two areas of language processing. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required