Document Validation Revisited

As you know by now, the goal of most XML documents is to be valid. Document validity is extremely important because it guarantees that the data within a document conforms to a standard set of guidelines as laid out in a schema (DTD, XSD, or RELAX NG schema). Not all documents have to be valid, which is why I used the word "most" a moment ago. For example, many XML applications use XML to code small chunks of data that really don't require the thorough validation options made possible by a schema. Even in this case, however, all XML documents must be well formed. A well-formed document, as you may recall, is a document that adheres to the fundamental structure of the XML language. Rules for well-formed documents include matching start tags with end tags and setting values for all attributes used, among others.

An XML application can certainly determine if a document is well formed without any other information, but it requires a schema in order to assess document validity. This schema typically comes in the form of a DTD (Document Type Definition) or XSD (XML Schema Definition), which you learned about in Tutorial 3, "Defining Data with DTD Schemas," and Tutorial 7, "Using XML Schema." To recap, schemas allow you to establish the following ground rules that XML documents must adhere to in order to be considered valid:

  • Establish the elements that can appear in an XML document, along with the attributes that can be used with each

  • Determine whether an element is empty or contains content (text and/or child elements)

  • Determine the number and sequence of child elements within an element

  • Set the default value for attributes

It's probably safe to say that you have a good grasp on the usefulness of schemas, but you might be wondering about the details of how an XML document is actually validated with a schema. This task begins with the XML processor, which is typically a part of an XML application. The job of an XML processor is to process XML documents and somehow make the results available for further processing or display within an application. A modern web browser, such as Internet Explorer, Firefox, Safari, or Opera, includes an XML processor that is capable of processing an XML document and displaying it using a style sheet. The XML processor knows nothing about the style sheetit just hands over the processed XML content for the browser to render.

The actual processing of an XML document is carried out by a special piece of software known as an XML parser. An XML parser is responsible for the nitty-gritty details of reading the characters in an XML document and resolving them into meaningful tags and relevant data. There are two types of parsers capable of being used during the processing of an XML document:

  • Standard (non-validating) parser

  • Validating parser

A standard XML parser, or non-valid parser, reads and analyzes a document to ensure that it is well formed. A standard parser checks to make sure that you've followed the basic language and syntax rules of XML. Standard XML parsers do not check to see if a document is validthat's the job of a validating parser. A validating parser picks up where a standard parser leaves off by comparing a document with its schema and making sure it adheres to the rules laid out in the schema. Because a document must be well-formed as part of being valid, a standard parser is still used when a document is being validated. In other words, a standard parser first checks to see if a document is well-formed, and then a validating parser checks to see if it is valid.

In actuality, a validating parser includes a standard parser so that there is technically only one parser that can operate in two different modes.

When you begin looking for a means to validate your documents, make sure you find an XML application that includes a validating parser. Without a validating parser, there is no way to validate your documents. You can still see if they are well formed by using a standard parser only, which is certainly important, but it's generally a good idea to go ahead and carry out a full validation.