XML

What Is SAX?

SAX is a programming interface for event-based parsing of XML files. In practical terms, this means that SAX takes a very different approach to parsing XML code than its counterpart, the DOM. If you recall from previous tutorials, XML documents are processed using parsers. The parser reads the XML document; verifies that it is well formed; and, if it's a validating parser, validates it against a schema or DTD. What happens next depends on the parser you're using. In some cases, it might copy the data into a data structure that's native to the programming language you're using. In other cases, it might transform the data into a presentation format or apply styles to it. The SAX parser doesn't do anything to the data other than trigger certain events. It's up to the user of the SAX parser to determine what happens when those events occur.

What I mean when I say that SAX is a programming interface is that it isn't a program, it's a documenta standardthat describes how a SAX parser should be written. It explains which events must be supported in a compliant SAX parser and leaves it up to the implementers to make sure that the parsers they write comply.

An interface is basically a contract offered by someone who writes a program or specifies how a program should work. It says that as long as you implement all of the features specified in the interface, any programs written to use that interface will work as expected. When someone writes a parser that implements the SAX interface, it means that any program that supports all of the events specified in the SAX interface can use that parser.

A Really Brief History of SAX

Most of the time when you're dealing with XML, one standards body or another developed the various technologies. With SAX, that isn't the case. SAX was developed by members of the xml-dev mailing list in order to provide XML developers with a way to deal with XML documents in a simple and straightforward manner. One of the lead developers in this mailing list was Dave Megginson, whose name often comes up in discussions related to SAX, and who has resumed maintaining SAX after a hiatus. You can find out more about SAX at http://www.saxproject.org/.

The original version of SAX, 1.0, was released in May 1998. The most recent version is SAX 2.0.2, which was released in April 2004. Earlier versions of the SAX API were implemented initially as Java interfaces. However, you can write a SAX parser in any language, and indeed, there are SAX parsers available for most popular programming languages. However, I'm going to talk about the features that were made available in the Java versionyou can assume they'll also be available under whatever implementation you choose to use. Let's look at the specifics of these two releases.

SAX 2.0.2 is a fairly minor enhancement of the original SAX 2.0 release that came out back in May 2000. Throughout the remainder of this lesson I generally refer to the latest release of SAX as version 2.0.

SAX 1.0

SAX 1.0 provides support for triggering events on all of the standard content in an XML document. Rather than telling you everything it does support, it's easier to tell you that SAX 1.0 does not support namespaces. A program that uses a SAX 1.0 parser must support the following methods, which are automatically invoked when events occur during the parsing of a document:

  • characters() Returns the characters found inside an element

  • endDocument() TRiggered when parsing of the document is complete

  • endElement() triggered when the closing tag for any element is encountered

  • ignorableWhitespace() triggered when whitespace is encountered between elements

  • processingInstruction() triggered when a processing instruction is encountered in the document

  • startElement() triggered when the opening tag for an element is encountered

If you don't have a programming background, allow me to clarify that a method is a sequence of programming code that performs a certain task. Methods are very similar to functions in programming languages other than Java.

SAX 1.0 also handles attributes of elements by providing them through its interface when the startElement() method of the document handler is called. SAX 1.0 has been deprecated now that SAX 2.0 has been implemented. In the Java world, most SAX 2.0 libraries (such as Xerces) still support SAX 1.0 so that they'll work with legacy SAX 1.0 applications. But if you're writing a new application that uses SAX, you should use SAX 2.0.

SAX 2.0

SAX 2.0 is an extension of SAX 1.0 that provides support for namespaces. As such, programs that communicate with a SAX 2.0 parser must support the following methods:

  • startPrefixMapping() TRiggered when a prefix mapping (mapping a namespace to an entity prefix) is encountered

  • endPrefixMapping() triggered when a prefix mapping is closed

  • skippedEntity() TRiggered whenever an entity is skipped for any number of reasons