In the Filtering Data section in Chapter 6, Controlling the Flow of Data, you identified words found in a text file. On that occasion, you already did some cleaning by eliminating from the text all the characters that weren't part of legal words, for example, parentheses, hyphens, and so on. Recall that you used the Replace in String step for this.
There is more cleansing that we can do in this text. For example, if your intention is to calculate some statistics with geological-related words, you might prefer to discard a lot of words that are valid in the English language but useless for your work. Let's look at a way to get rid of these:
- Open the Transformation from Chapter 6, Controlling the Flow of Data, ...