Combining Diacritical Marks
The Combining Diacritical Marks block contains characters that are not used on their
own, such as the accent grave and circumflex. Instead, they are
merged with the preceding character to form a single glyph. For
example, to write the character Ñ, you could type the ASCII letter N
followed by the combining tilde character, like this: Ñ
. When rendered, this
combination would produce the single glyph Ñ. In Table 27-11, the character
to which the diacritical mark is attached is a dotted circle ◌ (Unicode code point &0x25CC;
), but of course it could be
any normal character.
For compatibility with legacy character sets, there is often
more than one way to write accented characters. For example the
letter é, e with accent acute, can either be written as the single
character 0xE9
or as the letter e
(0x65
) followed by a combining
accent acute (0x301
). This can be
a problem for naïve algorithms for searching, sorting, indexing, and
performing other operations on text. It’s also an issue for XML. For
instance, the <resumé>
start-tag cannot be matched with a </resumé>
end-tag if one uses
character 0xE9
and the other uses
0x65
followed by 0x301
. Where such multiple ways of writing the same character exist, the W3C strongly recommends using the precomposed form; that is, you should use the single character instead of the base character followed by a combining diacritical mark. In XML, these marks are primarily intended for forming characters that do not have ...
Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.