Convert Microsoft Office Files, Old or New, to XML

Use OpenOffice as a tool to convert Microsoft Office files to XML.

OpenOffice (http://www.openoffice.org/), the free, open source, multiplatform office application suite that provides an alternative to Microsoft Office, uses a documented XML format as its native file format. Put this together with OpenOffice 1.1’s ability to read Word, Excel, and PowerPoint files from Office 97, 2000, and XP, plus Word 6.0 files, Word 95 files, and Excel 4.0, 5.0, and 95 files, and you’ve got a simple way to convert these files to XML.

When you store a document in OpenOffice’s own file format [Hack #65] , you’ll create a ZIP file with the extension .sxw if you saved it with the OpenOffice Writer word processing program, .sxc if you saved it with the OpenOffice Calc spreadsheet program, or .sxi if you used the OpenOffice Impress slideshow program. The six files that you’ll find in these ZIP files have self-explanatory names: mimetype, content.xml, styles.xml, meta.xml, settings.xml, and manifest.xml.

Unless you’re strongly interested in the inner workings of OpenOffice, the file content.xml should hold the most interest. Along with file content, it stores information about the use of built-in styles, styles you defined yourself, and even on-the-fly styling information not tied to defined styles, such as bolding of text with Ctrl-B. For word-processing files, the XML also identifies bulleted and numbered lists and footnotes. XML versions of spreadsheets ...

Get XML Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.