Removing repeating characters
In everyday language, people are often not strictly grammatical. They will write things such as I looooooove it
in order to emphasize the word love
. However, computers don't know that "looooooove" is a variation of "love" unless they are told. This recipe presents a method to remove these annoying repeating characters in order to end up with a proper English word.
Getting ready
As in the previous recipe, we will be making use of the re
module, and more specifically, backreferences. A backreference is a way to refer to a previously matched group in a regular expression. This will allow us to match and remove repeating characters.
How to do it...
We will create a class that has the same form as the RegexpReplacer
class ...
Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.