Document Types and Schemas

When we talk about document types, we are speaking of something very similar to the notion of types in a programming language. Programming language types are used to describe structures that can be composed in particular ways, and document types do the same thing. The primitive components and the types of composition that are allowed differ, but they are conceptually aligned. A document type is commonly referred to as a schema. The difference between a document type and a database schema can be shallow in many applications, though the similarity is not always relevant. We often use schema to refer to a document type when it is not important how it was defined, because the phrase “document type” has historical associations with a particular schema language.

Schemas are valuable for several reasons, but two dominate: they require critical thinking about the applications and data to design, and they can be used to help specify how documents should constructed and interpreted when exchanged across organizational boundaries. The latter can be especially critical in applications such as supply-chain integration, where the automated exchange of dynamically generated documents can incur contractual obligations—it becomes very important that everyone agree what the documents mean, because misinterpretation can be very costly!

Document types are built on top of data types as well as on top of structuring rules, in which data types are very analogous to the primitive ...

Get Python & XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.