2.7. Well-Formed Documents

XML gives you considerable power to choose your own element types and invent your own grammars to create custom-made markup languages. But this flexibility can be dangerous for XML parsers if they don't have some minimal rules to protect them. A parser dedicated to a single markup language such as an HTML browser can accept some sloppiness in markup, because the set of tags is small and there isn't much complexity in a web page. Since XML processors have to be prepared for any kind of markup language, a set of ground rules is necessary.

These rules are very simple syntax constraints. All tags must use the proper delimiters; an end tag must follow a start tag; elements can't overlap; and so on. Documents that satisfy these rules are said to be well-formed. Some of these rules are listed here.

The first rule is that an element containing text or elements must have start and end tags.

GoodBad
<list>
  <listitem>soupcan</listitem>
  <listitem>alligator</listitem>
  <listitem>tree</listitem>
</list>
<list>
  <listitem>soupcan
  <listitem>alligator
  <listitem>tree
</list>

An empty element's tag must have a slash (/) before the end bracket.

GoodBad
<graphic filename="icon.png"/>
<graphic filename="icon.png">

All attribute values must be in quotes.

GoodBad
<figure filename="icon.png"/>
<figure filename=icon.png/>

Elements may not overlap.

GoodBad
<a>A good <b>nesting</b> 
example.</a>
<a>This is <b>a poor</a> 
  nesting scheme.</b>

Isolated markup characters may not appear ...

Get Learning XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.