Reading HTML

Figure 23-11 shows the hierarchy breakdown for the classes involved in reading and parsing an HTML document with the HTMLEditorKit.

The class hierarchy for parsing HTML via HTMLEditorKit

Figure 23-11. The class hierarchy for parsing HTML via HTMLEditorKit

Document Parsers

The first function involved in loading and displaying an HTML document is parsing it. The HTMLEditorKit class has hooks for returning a parser to do the job. The classes in the javax.swing.text.html.parser package implement a DTD-based[8] parser for this purpose.

But since we’re here, let’s look at the flow of an incoming HTML document. The editor kit instantiates a parser to read the document. ParserDelegator does what its name implies and delegates the actual parsing duties to another class—DocumentParser, in this case. ParserDelegator also handles loading the DTD used to create the real parser. Ostensibly, you could load your own DTD, but this whole process is rather tightly coupled to the HTML DTD supplied by the good folks at Sun. Once the parser is in place, you can send it a document and a ParserCallback instance and start parsing. As the parser finds tokens and data, it passes them off to the callback instance that does the real work of building the document.

You can display the document as it is built, or you can wait for the entire document to be loaded before displaying it. The tokenThreshold property from HTMLDocument determines exactly when the ...

Get Java Swing, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.