XML

Comparing XHTML and HTML

You probably know that the latest version of HTML is version 4.0 (4.01 to be exact), which is in wide use across the Web. XHTML is a reformulated version of HTML 4.0 that plays by the more rigid rules of XML. Fortunately, most of the differences between XHTML and HTML 4.0 are syntactic, which means that they don't dramatically impact the overall structure of HTML documents. Migrating an HTML 4.0 document to XHTML is more a matter of cleaning and tightening up the code than converting it to a new language. If you have any web pages that were developed using HTML 4.0, you'll find that they can be migrated to XHTML with relative ease. You learn how to do this later in the tutorial in the section titled, "Migrating HTML to XHTML."

Even though XHTML supports the same elements and attributes as HTML 4.0, there are some significant differences that are due to the fact that XHTML is an XML-based language. Given your knowledge of XML, you may already have a pretty good idea regarding some of these differences, but the following list will help you to understand exactly how XHTML documents differ from HTML documents:

  • XHTML documents must be well formed.

  • Element and attribute names in XHTML must be in lowercase.

  • End tags in XHTML are required for nonempty elements.

  • Empty elements in XHTML must consist of a start-tag/end-tag pair or an empty element.

  • Attributes in XHTML cannot be used without a value.

  • Attribute values in XHTML must always be quoted.

  • An XHTML namespace must be declared in the root html element.

  • The head and body elements cannot be omitted in XHTML.

  • The title element in XHTML must be the first element in the head element.

  • In XHTML, all script and style elements must be enclosed within CDATA sections.

These differences between XHTML and HTML 4.0 shouldn't come as too much of a surprise. Fortunately, none of them are too difficult to find and fix in HTML documents, which makes the move from HTML 4.0 to XHTML relatively straightforward. However, web pages developed with versions of HTML prior to 4.0 typically require more dramatic changes. This primarily has to do with the fact that HTML 4.0 does away with some previously popular formatting attributes such as background and instead promotes the usage of style sheets. Because XHTML doesn't support these formatting attributes, it is necessary first to convert legacy HTML (prior to 4.0) documents to HTML 4.0, which quite often involves replacing formatting attributes with CSS equivalents. Once you get a web page up to par with HTML 4.0, the move to XHTML is pretty straightforward.