O'Reilly logo

Pentaho 3.2 Data Integration Beginner's Guide by María Carina Roldán

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Time for action—counting frequent words by filtering

Let's suppose, you have some plain text files, and you want to know what is said in them. You don't want to read them, so you decide to count the times that words appear in the text, and see the most frequent ones to get an idea of what the files are about.

Note

Before starting, you'll need at least one text file to play with. The text file used in this tutorial is named smcng10.txt and is available for you to download from the Packt website.

Let's work:

  1. Create a new transformation.
  2. By using a Text file input step, read your file. The trick here is to put as a separator a sign you are not expecting in the file, for example |. By doing so, the entire line would be recognized as a single field. Configure ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required