XML

Style Sheets and XML Formatting

Very few XML-based markup languages are designed to accommodate the formatting of content described with them. This is actually by design the whole premise of XML is to provide a way of associating meaning to information while separating the appearance of the information. The appearance of information is very much a secondary issue in XML. Of course, there are situations where it can be very important to view XML content in a more understandable context than raw XML code (elements and attributes), in which case it becomes necessary to format the content for display. Formatting XML content for display primarily involves determining the layout and positioning of the content, along with the fonts and colors used to render the content and any related graphics that accompany the content. XML content is typically formatted for specific display purposes, such as within a web browser.

Similar to HTML, XML documents are formatted using special formatting instructions known as styles. A style can be something as simple as a font size or as powerful as a transformation of an XML element into an HTML element or an element in some other XML-based language. The general mechanism used to apply formatting to XML documents is known as a style sheet. I say "general" because there are two different approaches to styling XML documents with style sheets: CSS (Cascading Style Sheets) and XSL (eXtensible Style Language). Although I'd love to jump into a detailed discussion of CSS and XSL, I think a quick history lesson is in order so that you understand the relevance of style sheets. The next couple of sections provide you with some background on style sheets as they relate to HTML, along with how they enter the picture with XML. I'll make it as brief as possible so that you can get down to the business of seeing style sheets in action with XML.

The Need for Style Sheets

If it wasn't for the success of HTML, it's unlikely that XML would have ever been created. The concept of using a markup language to code information is nothing new, but the idea of doing it with a simple, compact language is relatively new. HTML is the first markup language that made it possible to code information in a compact format that could be displayed without too much complexity. However, HTML wasn't intended to be a presentation language. Generally speaking, markup languages are designed to add structure and context to information, which usually has nothing to do with how the information is displayed. The idea is that you use markup code to describe the content of documents and then apply styles to the content to render it for display purposes. The problem with this approach is that it has only recently been adopted by HTML. This has to do with the fact that HTML evolved so rapidly that presentation elements were added to the language without any concern over how it might complicate things.

In its original form, HTML stuck to the notion of being a purely content-based markup language. More specifically, HTML was designed as a markup language that allowed physicists to share technical notes. Early web browsers allowed you to view HTML documents, but the browsers, not the HTML markup, determined the layout of the documents. For example, paragraphs marked up with the <p> tag might have been displayed in a 12-point Arial font in a certain browser. A different browser might have used a 14-point Helvetica font. The point is that the browsers made the presentation decisions, not the documents themselves, which is in keeping with the general concept of a markup language.

As you probably know, things changed quickly for HTML when the popularity of the Web necessitated improvements in the appearance of web pages. In fact, HTML quickly turned into something it was never meant to bea jumbled mess of content and presentation markup. At the time it made sense to hack on presentation elements to HTML because it allowed for better-looking web pages. Another factor that complicated HTML was the "browser wars," which pitted web browser vendors against one another in a game of feature one-upmanship that resulted in all kinds of new HTML presentation tags. These tags proved extremely problematic for web developers because they were usually supported on only one browser or another.

To summarize my HTML soapbox speech, we all got a little carried away and tried to turn HTML into something it was never intended to be. No one really thought about what would happen after a few years of tacking on tag after presentation tag to HTML. Fortunately, the web development community took some time to assess the future of the Web and went back to the ideal of separating content from presentation. Style sheets provide the mechanism that makes it possible to separate content from presentation and bring some order to HTML. Whereas style sheets are a good idea for HTML documents, they are a necessity for displaying XML documentsmore on this in a moment. A style sheet addresses the presentation needs of HTML documents by defining layout and formatting rules that tell a browser how to display the different parts of a document.

Unlike HTML, XML doesn't include any standard elements that can be used to describe the appearance of XML documents. For example, there is no standard <b> tag in XML for adding bold formatting to text in XML. For this reason, style sheets are an absolute necessity when it comes to displaying XML documents.

By the Way

Technically, you could create your own XML-based markup language and include any presentation-specific tags you wanted, such as <bold>, <big>, <small>, <blurry>, and so on. However, web browsers are designed specifically to understand HTML and HTML only and therefore wouldn't inherently understand your presentation tags. This is why style sheets are so important to XML.


Getting to Know CSS and XSL

Style sheets aren't really anything new to web developers, but they were initially slow to take off primarily due to the fact that browser support for them was sketchy for quite some time. Cascading Style Sheets, or CSS, represent the HTML approach to style sheets because they were designed specifically to solve the presentation problems inherent in HTML. Because CSS originally targeted HTML, it has been around the longest and has garnered the most support among web developers. Even so, only recently has CSS finally gained reasonably consistent support in major web browsers; all major web browsers now more or less offer full support for the latest CSS standard, CSS 2.

eXtensible Style Language, or XSL, is a much newer technology than CSS and represents the pure XML approach to styling XML documents. XSL has had somewhat of a hurdle to clear in terms of browser acceptance but the latest releases of most major web browsers provide solid support for a subset of XSL known as XSLT (XSL Transformation), which allows you to translate XML documents into HTML. XSLT doesn't tackle the same layout and formatting issues as CSS and therefore isn't really a competing technology. The layout and formatting portion of XSL is known as XSL Formatting Objects, or XSL-FO, and is unfortunately not as fully supported as XSLT. For the time being, XSL-FO is primarily being used to format XML data for printing. In fact, XSL-FO is commonly used to generate printer-friendly Adobe Acrobat PDF documents from XML documents.

Generally speaking, you can think of XSL's relationship to XML as being similar to CSS's relationship to HTML. This comparison isn't entirely accurate since XSL effectively defines a superset of the styling functionality in CSS thanks to XSL-FO, whereas XSLT offers a transformation feature that has no equivalent in CSS. But in very broad terms, you can think of XSL as the pure XML equivalent of CSS. The next couple of sections explain CSS and XSL in more detail.

Cascading Style Sheets (CSS)

As you've learned, CSS is a style sheet language designed to style HTML documents, thereby allowing web developers to separate content from presentation. Prior to CSS, the only options for styling HTML documents beyond the presentation tags built into HTML were scripting languages and hybrid solutions such as Dynamic HTML (DHTML). CSS is much simpler to learn and use than these approaches, which makes it ideal for styling HTML documents, and it doesn't impose any of the security risks associated with scripts. Although CSS was designed for use with HTML, there is nothing stopping you from using it with XML. In fact, it is quite useful for styling XML documents.

When a CSS style sheet is applied to an XML document, it uses the structure of the document as the basis for applying style rules. More specifically, the hierarchical "tree" of document data is used to apply style rules. Although this works great in some scenarios, it's sometimes necessary to alter the structure of an XML document before applying style rules. For example, you might want to sort the contents of a document alphabetically before displaying it. CSS is very useful for styling XML data, but it has no way of allowing you to collate, sort, or otherwise rearrange document data. This type of task is best suited to a transformation technology such as XSLT. The bottom line is that CSS is better suited to the simple styling of XML documents for display purposes. Of course, you can always transform a document using XSLT and then style it with CSS, which is in some ways the best of both worlds, at least in terms of XML and traditional style sheets.

On behalf of die-hard CSS advocates, I'd like to point out that you can transform an XML document using a scripting language and the Document Object Model (DOM) prior to applying CSS style sheets, which achieves roughly the same effect as using XSLT to transform the document. Although the DOM certainly presents an option for transforming XML documents, there are those of us who would rather use a structured transformation language instead of having to rely on custom scripts. You learn how to use scripts and the DOM with XML in Part IV, "Processing and Managing XML Data."

Extensible Style Language (XSL)

Earlier in the tutorial I mentioned that XSL consists of two primary components that address the styling of XML documents: XSLT and XSL-FO. XSLT stands for XSL Transformation and is the component of XSL that allows you to transform an XML document from one language to another. For example, with XSLT you could translate one of your custom ETML training log documents into HTML that is capable of being displayed in a web browser. The other part of XSL is XSL-FO, which stands for XSL Formatting Objects. XSL-FO is somewhat of a supercharged CSS designed specifically for XML. Both XSLT and XSL-FO are implemented as XML-based markup languages. Using these two languages, web developers theoretically have complete control over both the transformation of XML document content and its subsequent display. I say "theoretically" because XSL-FO has yet to catch on as a browser rendering style sheet language, and thus far has been relegated to assisting in formatting XML documents for printing.

Because both components of XSL are implemented as XML languages, style sheets created from them are XML documents. This allows you to create XSL style sheets using familiar XML syntax, not to mention being able to use XML development tools. You might see a familiar connection between XSL and another XML technology, XML Schema. As you may recall from the previous tutorial, XML Schema is implemented as an XML language (XSD) that replaces a pre-XML approach (DTD) for describing the structure of XML documents. XSL is similar in that it, too, employs XML languages to eventually replace a pre-XML approach (CSS) to styling XML documents.

Rendering XML with Style Sheets

Although the general premise of style sheets is to provide a means of displaying XML content, it's important to understand that style sheets don't necessarily have complete control over how XML content appears. For example, text that is styled with emphasis in a style sheet might be displayed in italics in a traditional browser, but it could be spoken with emphasis in a browser for the visually impaired. This distinction doesn't necessarily impact the creation of style sheets, but it is worth keeping in mind, especially as new types of web-enabled devices are created. Some of these new devices will render documents in different ways than we're currently accustomed to. On the other hand, it is possible to create style sheets that are very exacting when it comes to how XML data is displayed. For example, using XSL-FO you can specify the exact dimensions of a printed page, including margin sizes and the specific location of XML content on the page. The degree to which you have control over the appearance of styled XML content largely has to do with whether the content is being rendered in a web browser or in some other medium, such as print.

The concept of different devices rendering XML documents in different ways has been referred to as cross-medium rendering due to the fact that the devices typically represent different mediums. Historically, HTML has had to contend with cross-browser rendering, which was caused by different browsers supporting different presentation tags. Even though style sheets alleviate the cross-browser problem, they don't always deal with the cross-medium problem. To understand what I mean by this, consider CSS style sheets, which provide a means of applying layout rules to XML so that it can be displayed. The relatively simplistic styling approach taken by CSS isn't powerful enough to deal with the cross-medium issue because it can't transform an XML document into a different format, which is often required to successfully render a document in a different medium.

XSLT addresses the need for transforming XML documents according to a set of highly structured patterns. For display purposes, you can use XSLT to translate an XML document into an HTML document. This is the primary way XML developers are currently using XSL because it doesn't require anything more on the part of browsers than support for XSLT; they don't have to be able to render a document directly from XML. CSS doesn't involve any transformation; it simply provides a means of describing how different parts of a document should be displayed.

Some people incorrectly perceive XSL and CSS as competing technologies, but they really aren't. In fact, it can be very advantageous to use XSLT and CSS together. Competition primarily enters the picture with XSL-FO, which indeed does everything that CSS can do, and much more. Even so, the popularity of CSS as a style sheet technology for web pages will likely prevent XSL-FO from seriously encroaching on it in the near term. For now, we'll likely see CSS continue to be used as the dominant style sheet technology for web-based XML formatting, while XSL-FO will continue to rise in importance for print-based XML formatting.