3.6. Identifiers

In Chapter 10, which discusses fonts and the Web, we shall give a quick introduction to XML (pages 345-349), and we shall discuss tags for elements and entities. The reader will notice that we have carefully refrained from defining the way in which this markup is constructed—a subject that is not necessarily of interest to the XML novice.

A priori, we can regard XML tags as being written with ASCII letters and digits; at least that is what we shall see in all the examples. That is true for good old SGML but not for young, dynamic XML, which proudly proclaims itself "Unicode compatible". We are free to use , and other exotic tags!

But does that really mean that we can use just any Unicode character in the names of our tags? No. By this point, the reader will certainly be aware that the various scripts of the world have largely the same structures as ours: letters (or similar), diacritical marks, punctuation marks, etc. Therefore we shall do in other scripts as we do in our own: letters will be allowed in tag names; diacritical marks will also be allowed but may not come first; punctuation marks will not be allowed (with a few exceptions).

But XML is not the only markup system in the world—to say nothing of all the various programming languages, which have not tags but identifiers. Should every markup system and every programming language be allowed to choose the ...

Get Fonts & Encodings now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.