HTML, SGML, and XML

HTML is the primary format used for Web documents. As I said earlier, HTML is a simple standard for describing the semantic content of textual data. The idea of describing a text’s semantics rather than its appearance comes from an older standard called the Standard Generalized Markup Language (SGML). Standard HTML is an instance of SGML. SGML was invented beginning in the mid-1970s by Charles Goldfarb at IBM. SGML is now an International Standards Organization (ISO) standard, specifically ISO 8879:1986.

SGML and, by inheritance, HTML are based on the notion of design by meaning rather than design by appearance. You don’t say that you want some text printed in 18-point type; you say that it is a top-level heading (<H1> in HTML). Likewise, you don’t say that a word should be placed in italics. Rather you say it should be emphasized (<EM> in HTML). It is left to the browser to determine how to best display headings or emphasized text.

The tags used to mark up the text are case insensitive. Thus <STRONG> is the same as <strong> is the same as <Strong> is the same as <StrONg>. Some tags have a matching closing tag to define a region of text. A closing tag is the same as the opening tag except that the opening angle bracket is followed by a /. For example: <STRONG>this text is strong</STRONG>; <EM>this text is emphasized</EM>. The entire text from the beginning of the start tag to the end of the end tag is called an element. Thus <STRONG>this text is strong</STRONG> ...

Get Java Network Programming, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.