The XML Family of Standards

XML was specifically designed to combine the flexibility of SGML with the simplicity of Hypertext Markup Language (HTML). HTML, the markup language upon which the World Wide Web is based, is an application of an older and more complex language known as Standard Generalized Markup Language (SGML). SGML was created to provide a standardized language for complex documents, such as airplane repair manuals and parts lists. HTML, on the other hand, was designed for the specific purpose of creating documents that could be displayed by a variety of different web browsers. As such, HTML provides only a subset of SGML’s functionality and is limited to features that make sense in a web browser. XML takes a broader view.

There are several types of tasks you’ll typically want to perform with XML documents. XML documents can be read into arbitrary data structures, manipulated in memory, and written back out as XML. Existing objects can be written (or serialized, to use the technical term) to a number of different XML formats, including ones that you define, as well as standard serialization formats. The technologies most commonly used to perform these operations are the following:

Input

In order to read an XML Document into memory, you need to read it. There are a variety of XML parsers that can be used to read XML, and I discuss the .NET implementation in Chapter 2.

Output

After either reading XML in or creating an XML representation in memory, you’ll most likely need to write it out to an XML file. This is the flip side of parsing, and it’s covered in Chapter 3.

Extension

You can use the same APIs you use to read and write XML to read and write other formats. I explore how this works in Chapter 4.

DOM

Once it has been read into memory, you can manipulate an XML document’s tree structure through the Document Object Model (DOM). The DOM specification was developed to introduce a platform-independent model for XML documents. The DOM is discussed in Chapter 5.

XPath

You will sometimes want to locate a particular element or attribute in the content of an XML document. The XPath specification provides the mechanism used to navigate an XML document. I talk about XPath in Chapter 6.

XSLT

Different organizations often develop different markup languages for the same problem domain. In those cases, it can be useful to transform an existing XML document in one format into another document in another format. XML Stylesheet Language Transformations (XSLT) was developed to enable you to convert XML documents into other XML and non-XML formats. XSLT is discussed in Chapter 7.

XML Schema

The original XML specification included the Document Type Description (DTD), which allows you to specify the structure of an XML document. The XML Schema standard allows you to constrain an XML document in a more formal manner than DTD. Using an XML Schema, you can ensure that a document structure and content fits the expected model. I discuss XML Schema in Chapter 8.

Serialization

In addition to the XML technologies listed above, there are specific XML syntaxes used for specific purposes. One such purpose is serializing objects into XML. Objects can be serialized to an arbitrary XML syntax, or they can be serialized to the Simple Object Access Protocol (SOAP). I discuss serialization in Chapter 9.

Web Services

Web Services allows for the sharing of resources on a network as if they were local through XML syntaxes such as SOAP, Web Services Definition Language (WSDL), and Universal Description, Discovery, and Integration (UDDI). Web Services provides the foundation for .NET remoting, although Web Services is, by its nature, an open framework that is operating system- and hardware-independent. Although Web Services as a topic can fill several volumes, I talk about it briefly in Chapter 10.

Data

Most modern software applications are concerned in some way with storing and accessing data. While XML can itself be used as a rudimentary data store, relational database management systems, such as SQL Server, DB2, and Oracle, are much better at providing quick, reliable access to large amounts of data. Like Web Services, database access is a huge topic; I’ll try to give you a taste for XML-related database access issues in Chapter 11.

Since its invention, XML has gone far beyond the language for web site design that HTML is. It has acquired a host of related technologies, such as XHTML, XPath, XSLT, XML Schema, SOAP, WSDL, and UDDI, some of which are syntaxes of XML, and some of which simply add value to XML—and some of which do both.

I’ve just introduced a lot of acronyms, so look at Figure 1-2 for a visual representation of the relationships between some of these standards.

SGML and its progeny
Figure 1-2. SGML and its progeny

Get .NET & XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.