3 XML and JSON

XML, the eXtensible Markup Language, is one of the most popular formats for exchanging data over the Web. But it is more than that. It is ubiquitous in our daily life. As Harold and Means (2004, xiii) note:

XML has become the syntax of choice for newly designed document formats across almost all computer applications. It's used on Linux, Windows, Macintosh, and many other computer platforms. Mainframes on Wall Street trade stocks with one another by exchanging XML documents. Children playing games on their home PCs save their documents in XML. Sports fans receive real-time game scores on their cell phones in XML. XML is simply the most robust, reliable, and flexible document syntax ever invented.

XML looks familiar to someone with basic knowledge about HTML, as it shares the same features of a markup language. Nevertheless, HTML and XML both serve their own specific purposes. While HTML is used to shape the display of information, the main purpose of XML is to store data. Therefore, the content of an XML document does not get much nicer when it is opened with a browser—XML is data wrapped in user-defined tags. The user-defined tags make XML much more flexible for storing data than HTML. The main goal of this chapter is not to turn you into an XML coding expert, but to get you used to the key components of XML documents.

We start with a look at a running XML example (Section 3.1) and continue with an inspection of the XML syntax (Section 3.2). There are several ...

Get Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.