Reading HTML
Figure 23-11 shows
the hierarchy breakdown for the classes involved in reading and parsing
an HTML document with the HTMLEditorKit
.
Figure 23-11. The class hierarchy for parsing HTML via HTMLEditorKit
Document Parsers
The first function involved in loading and displaying an HTML
document is parsing it. The HTMLEditorKit
class has hooks for returning
a parser to do the job. The classes in the javax.swing.text.html.parser
package
implement a DTD-based[8] parser for this purpose.
But since we’re here, let’s look at the flow of an incoming HTML
document. The editor kit instantiates a parser to read the document.
ParserDelegator
does what its name
implies and delegates the actual parsing duties to another
class—DocumentParser
, in this case.
ParserDelegator
also handles
loading the DTD used to create the real parser. Ostensibly, you could
load your own DTD, but this whole process is rather tightly coupled to
the HTML DTD supplied by the good folks at Sun. Once the parser is in
place, you can send it a document and a ParserCallback
instance and start parsing.
As the parser finds tokens and data, it passes them off to the
callback instance that does the real work of building the
document.
You can display the document as it is built, or you can wait for
the entire document to be loaded before displaying it. The tokenThreshold
property from HTMLDocument
determines exactly when the ...
Get Java Swing, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.