Why are MIME types important? Why do I keep coming back to them? Three words: draconian error
handling. Browsers have always been âforgivingâ with
HTML. If you create an HTML page but
forget to give it a <title>
,
browsers will display the page anyway, even though the <title>
element has always been required
in every version of HTML. Certain tags are not allowed
within other tags, but if you create a page that puts them inside anyway,
browsers will just deal with it (somehow) and move on without displaying
an error message.
As you might expect, the fact that âbrokenâ HTML markup still worked in web browsers led authors to create broken HTML pages. A lot of broken pages. By some estimates, over 99 percent of HTML pages on the Web today have at least one error in them. But because these errors donât cause browsers to display visible error messages, nobody ever fixes them.
The W3C saw this as a fundamental problem with the Web, and set out
to correct it. XML, published in 1997, broke from the tradition of forgiving
clients and mandated that all programs that consumed XML
must treat so-called âwell-formednessâ errors as fatal. This concept of
failing on the first error became known as âdraconian error handling,â
after the Greek leader Draco, who
instituted the death penalty for relatively minor infractions of his laws.
When the W3C reformulated HTML as an XML
vocabulary, the people in charge mandated that all documents served with
the new application/xhtml+xml
MIME
type would be subject to draconian error handling. If there was
even a single error in your XHTML page, web browsers
would have no choice but to stop processing and display an error message
to the end user.
This idea was not universally popular. With an estimated error rate
of 99 percent on existing pages, the ever-present possibility of
displaying errors to the end user, and the dearth of new features in
XHTML 1.0 and 1.1 to justify the cost, web authors
basically ignored application/xhtml+xml
. But that doesnât mean
they ignored XHTML altogether. Oh, most definitely not.
Appendix C of the XHTML 1.0 specification gave the web
authors of the world a loophole: âUse something that looks kind of like
XHTML syntax, but keep serving it with the text/html
MIME type.â And
thatâs exactly what thousands of web developers did: they âupgradedâ to
XHTML syntax but kept serving it with a text/html
MIME type.
Even today, while many web pages claim to be
XHTMLâthey start with the XHTML doctype
on the first line, use lowercase tag names, use quotes around attribute
values, and add a trailing slash after empty elements like <br />
and <hr
/>
âonly a tiny fraction of these pages are served with the
application/xhtml+xml
MIME type that would trigger XMLâs
draconian error handling. Any page served with a MIME
type of text/html
, regardless of its
doctype, syntax, or coding style, will be parsed using a âforgivingâ
HTML parser, silently ignoring any markup errors and
never alerting end users (or anyone else), even if the page is technically
broken.
XHTML 1.0 included this loophole, but
XHTML 1.1 closed it, and the never-finalized
XHTML 2.0 continued the tradition of requiring draconian
error handling. And thatâs why there are billions of pages that claim to
be XHTML 1.0, and only a handful that claim to be
XHTML 1.1 (or XHTML 2.0). So, are you
really using XHTML? Check your MIME
type. (Actually, if you donât know what MIME type youâre
using, I can pretty much guarantee that youâre still using text/html
.) Unless youâre serving your pages
with a MIME type of application/xhtml+xml
, your so-called
âXHTMLâ is XML in name only.
Get HTML5: Up and Running now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.