Unit 13Processing HTML Files

The first type of structured text document you’ll look at is HTML—a markup language commonly used on the web for human-readable representation of information. An HTML document consists of text and predefined tags (enclosed in angle brackets <>) that control the presentation and interpretation of the text. The tags may have attributes. The following table shows some HTML tags and their attributes.

Table 3. Some Frequently Used HTML Tags and Attributes
TagAttributesPurpose
HTML Whole HTML document
HEAD Document header
TITLE Document title
BODY background, bgcolor Document body
H1, H2, H3, etc. Section headers
I, EM Emphasis
B, STRONG Strong emphasis
PRE Preformatted text
P, SPAN, DIV Paragraph, span, division ...

Get Data Science Essentials in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.