Chapter 4. Tagging XML in InDesign

The Case for Tagging Content: Why You Need XML

What makes sense for you to import or export as XML should be driven by a business need. The information in the XML should be valuable enough to justify the time and effort to mark up the content as XML and export it, or to create a template to import it.

Business functions, such as sales, marketing, manufacturing, shipping, and the like rely upon documents of various kinds to transmit information. If you look at documents, you can usually discern the function the document serves and who needs to use the document. If you look closely at documents that seem fairly “freeform,” such as marketing collateral, you can ferret out tidbits of discrete information within the text and images.

Try examining a piece of marketing literature and seeing what it really contains. Typically, in the small print are legal disclaimers, copyrights, trademark notices, and the like. Does the business have a way to control the wording and usage of these important pieces of content in all of their printed materials and web pages? Expensive lawsuits can result if they are left off, become outdated, or provide erroneous information.

Now look at the typical contact information and branding—company logos, slogans, addresses, phone numbers, and web, email, and street addresses. These also can differ from time to time, and from use to use, if they are retyped at the time that a marketing piece is created. Without a single source for these tidbits, there is the opportunity to omit or misspell every time a new content version is created.

If these small bits of scattered information don’t seem like they amount to much, imagine that the company has decided on a complete rebranding, or has just been acquired. What will the effort be to locate and change every trademark, logo, address, and slogan on every type of marketing literature, support document, user guide and other company documents?

What if the company needs everything to be provided in ten languages for the European market—how do you let the translators know the difference between company trade names, software commands, or product names and more general text in the documents? Someone will have to provide lists of words and phrases marked as “do not translate.” If these are already marked as XML, it is simple to indicate that <tradename>, <command>, or <prodname> element content should not be translated.

Tagging existing document content as XML provides the means to extract it in meaningful ways for use in business processes.

Tagging for Import

The most basic way to create XML imports is to create placeholders in an InDesign template. The key issues are understanding where you want the XML to come in to the InDesign layout, how it should look, and what the structure of the incoming XML will be. For more information, see Chapter 3.

In business terms, you are creating an output in a nicely formatted, printable document form to meet a business need. You hand out a business card to assist in following up a sales lead, or you give someone a handy quick start guide for a newly purchased product. You provide a set of product specifications so that someone can decide if a product meets his or her needs. All of these are good reasons to use InDesign to deliver an aesthetically pleasing business document. It makes the most sense when you can take the leap to seeing InDesign as one delivery mechanism among many options (web page, phone solicitation, multimedia presentation) that can connect you to customers or suppliers or partners. Using the same content across various delivery formats is leveraging the content creation process to streamline processes and reduce errors.

Tagging for Iterative XML Development

InDesign only supports one type of XML content model (DTD), which doesn’t differentiate numbers and dates as special types of data. So you can’t easily control whether people are going to put text, numbers, or dates in a particular manner in your XML elements. In database terms, there isn’t anything in InDesign that will enforce “data typing” in the XML that will be valid for doing calculations or other operations. If you accept this limitation, and generally view InDesign as a generator of text content, you will be fine with a database that expects text content in data fields.

If you really need to constrain the contents of an XML element to be a numeric value, a date/time or other data type than text, you will probably not be happy with InDesign’s XML limitations in this regard. Within InDesign itself, the values of XML elements will be treated as text only. If someone types numbers, they are not truly numbers (integer or float) as far as any XML export is concerned.

XSLT would let you perform “casting” from the text values in elements to some other data type. As a post-export process, you could change text numbers into true numeric values, for example. Refer to O’Reilly’s XSLT Cookbook by Sal Mangano for details.

Working Without an Initial DTD

For iterative development without a DTD, you look at the end result that you want, and the type of content that you are creating in InDesign, and design an output that will flow into the next process as simply as possible. This type of process works best for simple content that can be tagged in InDesign and mapped to a fairly shallow set of XML elements in the output.

Get XML and InDesign now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.