Filtering Character Streams

FilterReader is an abstract class that defines a null filter; it reads characters from a specified Reader and returns them with no modification. In other words, FilterReader defines no-op implementations of all the Reader methods. A subclass must override at least the two read( ) methods to perform whatever sort of filtering is necessary. Some subclasses may override other methods as well. Example 3-6 shows RemoveHTMLReader, which is a custom subclass of FilterReader that reads HTML text from a stream and filters out all of the HTML tags from the text it returns.

In the example, we implement the HTML tag filtration in the three-argument version of read( ), and then implement the no-argument version in terms of that more complicated version. The example includes an inner Test class with a main( ) method that shows how you might use the RemoveHTMLReader class.

Note that we could also define a RemoveHTMLWriter class by performing the same filtration in a FilterWriter subclass. Or, to filter a byte stream instead of a character stream, we could subclass FilterInputStream and FilterOutputStream. RemoveHTMLReader is only one example of a filter stream. Other possibilities include streams that count the number of characters or bytes processed, convert characters to uppercase, extract URLs, perform search-and-replace operations, convert Unix-style LF line terminators to Windows-style CRLF line terminators, and so on.

Example 3-6. RemoveHTMLReader.java ...

Get Java Examples in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.