Chapter 1. Microsoft Office and XML

Most people who use Microsoft Office see the individual applications as tools for getting their work done, not as general-purpose interfaces to information. Sure, people regularly exchange Word, Excel, and PowerPoint files over email, and there are lots of times when you need to reuse files you created earlier, but for the most part information created in Microsoft Office stays in Microsoft Office, coming or going from elsewhere largely by cut-and-paste or by often imperfect file conversions.

With the latest Windows-based version of Office, Microsoft has taken a risky step, opening up Office quite drastically. Developers, even those who aren’t using Microsoft Office—or even Microsoft Windows—will be able to easily process the information inside of Word and Excel files. Instead of just creating Word documents, users will be able to create data files that can be shared with other processes and systems. Excel users will be able to analyze data from a much wider variety of sources, and Access users will be able to exchange information with other databases and programs much more easily than before. Users of the Enterprise Edition of Office will also have a new forms-based interface, InfoPath, for working with other programs.

All of these things are possible because Microsoft has chosen to integrate XML deeply into the core of Microsoft Office.

Why XML?

Extensible Markup Language (XML) defines a text-based format containing labels and structures. XML looks a lot like HTML, the primary language used by web browsers, but XML lets users and developers create their own formats rather than limiting them to a single vocabulary. The XML 1.0 specification appeared in 1998, and a wide variety of applications have added XML functionality or been built around XML since then, from databases to stock tickers to editors to web browsers to inventory systems. While XML still requires readers and writers of documents to have some shared understandings about the documents they create and interpret, it provides a basic format that is easily processed in a wide variety of different environments—it’s even frequently human-readable.

Tip

If you’ve never worked with XML and need to know the technical details of how to read and create XML documents, you should read Appendix A of this book. This chapter provides a high-level view of what XML makes possible and why it makes sense for Office, not a detailed explanation of what XML is.

Microsoft has been involved with XML for a long time. A Microsoft employee, Jean Paoli (later a product manager for Microsoft Office), was one of the editors of the XML 1.0 specification at the World Wide Web Consortium (W3C). Microsoft has been involved with nearly every XML specification at the W3C since, and has participated in a wide variety of XML-related projects at other organizations as well. Microsoft began work on XML tools before the specification was complete, building the MSXML toolkit into Internet Explorer and then expanding into .NET and Web Services development. More and more Microsoft software has XML at its core, and this latest version of Office joins a large group of Microsoft applications using XML.

XML has been a crucial part of Microsoft’s drive to put its programs in more and more environments. XML makes it possible for Microsoft programs to communicate with programs from IBM, Sun, Oracle, and others, and greatly simplifies the task of integrating new tools with custom code. Developers can build applications around XML, and don’t have to worry about the internal details of components with which they share XML. Equally important, developers using XML don’t have to worry about being locked into a format that’s proprietary to a single vendor, because XML is open by design. The rules for what is and what is not a legitimate XML document are very clear, and while it’s possible to create XML that is difficult to read, a combination of strict grammatical rules and widely-shared best practices encourages developers to create formats that are easy to work worth. XML also includes features that support internationalization and localization, making it much easier to consistently represent information across language boundaries as well as program boundaries.

By adding XML to the Microsoft Office mix, Microsoft both makes it much easier to integrate Office with Microsoft programs that already understand XML (like SQL Server, SharePoint Server, and the toolkits in Visual Studio) and for developers to create their own combinations of Microsoft Office and other software. This allows Microsoft to connect to a much wider variety of software without making users worry about whether they’ll be able to use that information elsewhere. XML also lets users go much further in building custom applications around Microsoft Office.

XML itself is only one piece of a larger XML puzzle. Extensible Stylesheet Language Transformations (XSLT) is an XML-based language for transforming one XML document into another, using templates. XSLT is at the heart of much of the Office XML work, a key ingredient for moving from the XML you have to the XML Office needs and vice-versa. Another specification, W3C XML Schema, provides descriptions of document structures which the various Office applications can use as a foundation for their processing. Microsoft refers to this as XML Schema Definition language, or just XSD, but the W3C itself didn’t provide an acronym. Some sources refer to it as WXS (for W3C XML Schema), others as XSD, some as XSDL, and some just as XML Schema. Because Microsoft generally refers to it as XSD, this book will do the same.

One aspect of XML development in particular deserves special mention, because Microsoft has integrated it into Office alongside the more generic XML editing and analysis functions. Web Services, built on the SOAP, WSDL, and UDDI specifications, provide a set of tools for communicating with other programs using XML. You can still read and write files from your local computer, a file server, or a web server, but Web Services expose additional functionality of programs located anywhere on the network.

Get Office 2003 XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.