11.10. Printing the Text of an HTML Document

The core level-1 DOM API is equally applicable to both HTML and XML documents. As a consequence, programs that rely simply on nodes and node navigation will be very similar in structure.

The following program prints the text content of an HTML document. The DOM building part is HTML specific, but the core tree traversal algorithm would work for XML files as well.

CD-ROM reference=11013.txt
C>type dom7.py """ Print text of a HTML document by navigating DOM nodes. """ #Import core DOM module. from xml.dom import core #Import DOM HTML Builder. from xml.dom.html_builder import HtmlBuilder import string def PrintText(node): """ Function to print the character data of a DOM branch. """ if node.nodeType ...

Get XML Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.