Application/xhtml+xml

"Real" XHTML

Over the last few years, a good portion of the web development community has been moving towards this whole XHTML standards thing; this of course has been talked about before and on a million other web sites, so I won't bother to go into a "why-standards" XHTML-type rant. Suffice it to say, this is related.

XHTML is a subset of XML; in effect, "HTML with rules" in that it must be "valid" at parse time in order to be rendered by a browser or other device. Like XML, if XHTML has errors the device rendering it should fail to display the content and instead show an error.

Sometimes XHTML is just XHTML

.. And sometimes it's HTML.

An XHTML document served from your typical installation of Apache will be sent down with a text/html MIME type, which means the browser treats it as plain vanilla HTML - that is to say it doesn't care about validation, and your carefully-crafted valid markup is seen no differently than HTML 4.0. This is due to the MIME type difference.

What's in a MIME type?

MIME types (commonly used on the web) determine what kind of content is being sent down the wire and give the browser an idea of how to parse, render or otherwise deal with the content. application/zip, for example, is what's sent by the web server when your browser accesses a ZIP file. image/jpeg for JPG images, video/mpeg for MPEG video.. It's all pretty self-explanatory.

XML files have their own document type and associated MIME type, text/xml, so naturally it'd make sense for XHTML files to have their own. That's where application/xhtml+xml comes in.*

I realised a while ago after doing some reading, that adding an XHTML DTD to your page does not "automagically" bless your site with the full tidings of standards: You have to also send an associated MIME type for proper browsers to parse the page with full standards compliance. By compliance, I mean that a browser will display an error rather than a page if the XHTML markup is invalid - just like when you view an invalid XML document. (Go ahead, try it.)

Blame the Server

The default configuration of web servers is to send text/html down for files with extensions .php, .html and the like as that is the assumed (and common) format of those files. This of course is what also prevents browsers like Mozilla from rendering valid XHTML documents with full standards compliance in mind; as far as the browser is concerned, a perfectly-valid XHTML/XML document sent as text/html is treated by the browser as little more than "Tag Soup"; it might as well be HTML 4.0. (ie. the browser in this state would render markup that is completely invalid and will do so without complaining, etcetera.)

The solution depends on your configuration, but the crux of the issue is to substitute the MIME type text/html for application/xhtml+xml where applicable.

Serving up the correct MIME type

Using .htaccess

Directory-specific directive files such as .htaccess (as used by Apache, for example) can be used to associate a particular MIME type with a given file extension. For example, AddType application/xhtml+xml .xhtml will configure Apache to send .xhtml files with application/xhtml+xml.

Using PHP

If your site content is generated dynamically or otherwise served via PHP, you can use the header function to insert the Content-Type type header for you:

header ("Content-type: application/xhtml+xml");

Not for everyone

The amusing (and somewhat frustrating) part of this is that not all browsers recognise application/xhtml+xml, and will fail to render the page or give you a download prompt - the latter being a case of, "I don't know what an 'application/xhtml+xml' file is - what do you want to do with it?"

Internet Explorer is one of the "unsupported" browsers mentioned (as it often is when "the rules" are being discussed,) as well as earlier versions of Opera and others. The argument could be made, "why bother" with all this? My reasoning is that this is the evolution of browsers. There will always be some who support the latest and greatest and play by the rules, and then there will be everyone else. Provide gracefully-degrading content, and everyone should be happy.

In short, some browsers must be served text/html (ie. the default MIME type) to properly render XHTML documents. There should be no visual difference, but you won't have the benefit of the browser's built-in validation when developing, nor that geeky proud feeling that you get knowing you're living on the bleeding edge of browser technology. I suppose however as far as geeks and pride are concerned there are some some that can do without, and some that have plenty to spare.

Conditional PHP

This bit of code will conditionally handle most browsers (eg. it sends application/xhtml+xml to Mozilla, but not Internet Explorer) - the logic in plain English being, "If the MIME type is listed in the client's supported ACCEPT types (and a list is given), send it on down.."

if (isset($_SERVER["HTTP_ACCEPT"]) && stristr( $_SERVER["HTTP_ACCEPT"], "application/xhtml+xml") ) {
header ("Content-type: application/xhtml+xml");
} else {
header ("Content-type: text/html");
}

Apache's MOD_REWRITE library could likely make an efficient substitute for the PHP method shown, although I didn't look into it.

Okay, so what now?

Once XHTML documents are being served correctly, you will need to do the following:

  • Use an applicable DOCTYPE ("DTD" - ie. XHTML 1.0 or XHTML 1.1)
  • Validate XHTML markup (no invalid tags or characters etc.)
  • Migrate legacy Javascript (use DOM methods instead of older objects eg. document.documentElement instead of document.body, etc.)
  • Test using a supported browser (eg. Mozilla)

Once you have solved validation and script errors, you will have reached a milestone in the race to geekdom. (Don't you feel special?)

What I Learned

In doing research in moving this site, valid XHTML 1.0 at the time, to being served with application/xhtml+xml I made the following findings:

It's all about Trust (or lack thereof)

"Strict" XHTML deprecates Javascript methods like document.write() and write access to the .innerHTML property, because they can be used to insert invalid code that would break the document structure. (Nice, hey?)

For example, if you could make a call like someDiv.innerHTML = '<p>look! invalid-ness!';, that would invalidate your document and break the page which has already rendered. To get around the lack of innerHTML however, DOM methods such as createElement() and createTextNode() do the trick. A bit more verbose, but valid under this XML-typed document.

XHTML

  • Ampersands are evil and are the bane of web developers everywhere. Forget that "&nbsp" entity you probably like so much - it's good as gone.
  • The XHTML DTDs have different recommendations for MIME types depending on version. (See the W3C site for more information.)

CSS

  • Body backgrounds etc. should be applied to HTML element, eg. html, body { background-color:#ccc; }

Javascript

  • document.body deprecated in favour of document.documentElement
  • someElement.innerHTML = 'foo'; vs. someElement.appendChild( document.createTextNode('foo') );
  • Accessing some elements via getElementsByTagName() prior to document load (ie. at parse time) seems to be inconsistent

Conclusion

If I've done my homework correctly, this site should be served as valid XHTML 1.1 Strict with a MIME type of application/xhtml+xml to browsers that support it. I don't think it makes much difference or brings noticeable benefit to 99% of the user base who reads this stuff, but I figure at the least I can say I took the challenge and (hopefully) met the bar for compliance, maintaining a funky DHTML-driven and standards-based site - and if for nothing else, just to say I did it.

* At the time of writing (October 2004) I was researching this subject and writing based on findings and experimentation: Some inaccuracies may abound.

Related links