A Complete DTD Example

Admittedly, this tutorial has thrown a great deal of information at you, most of which is quite technical. But there's a method to the madness, and now it's time to see some of the payoff. To help you get some perspective on how elements and attributes fit into a DTD for a new custom markup language, let's work through the design of a DTD for a sports training markup language. This markup language, which we'll call ETML (Endurance Training Markup Language), might come in handy if you even decide to compete in a marathon or triathlonit models training data related to endurance sports such as running, swimming, and cycling. The following are the major pieces of information that are associated with each individual training session:

  • Date The date and time of the training session

  • Type The type of training session (running, swimming, cycling, and so on)

  • Heart rate The average heart rate sustained during the training session

  • Duration The duration of the training session

  • Distance The distance covered in the training session (measured in miles or kilometers)

  • Location The location of the training session

  • Comments General comments about the training session

Knowing that all of this information must be accounted for within a training session element, can you determine which ones would be better suited as child elements and which would be better suited as attributes? There really is no correct answer but there are a few logical reasons you might separate some of the information into elements and some into attributes. The following is how I would organize this information:

  • Attributes Date, Type, Heart rate

  • Child elements Duration, Distance, Location, Comments

The date, type, and heart rate for a training session are particularly well suited for attributes because they all involve short, simple values. The type attribute goes a step further because you can use an enumerated list of predefined values (running, cycling, and so on). The duration and distance of a session could really go either way in terms of being modeled by an element or an attribute. However, by modeling them as elements you leave room for each of them to have attributes that allow you to specify additional information such as the exact units of measure. The location and comments potentially contain descriptive text, and therefore are also better suited as child elements.

By the Way

A golden rule of XML design is that the more constraints you can impose on a document, the more structured its content will be. In other words, try to create schemas that leave little to chance in terms of how elements and attributes are intended to be used.

With the conceptual design of the DTD in place, you're ready to dive into the code. Listing 3.3 contains the code for the ETML DTD, which is stored in the file etml.dtd.

Listing 3.3. The etml.dtd DTD That Is Used to Validate ETML Documents
 1: <!ELEMENT trainlog (session)+>
 3: <!ELEMENT session (duration, distance, location, comments)>
 4: <!ATTLIST session
 5:   date CDATA #IMPLIED
 6:   type (running | swimming | cycling) "running"
 7:   heartrate CDATA #IMPLIED
 8: >
10: <!ELEMENT duration (#PCDATA)>
11: <!ATTLIST duration
12:   units (seconds | minutes | hours) "minutes"
13: >
15: <!ELEMENT distance (#PCDATA)>
16: <!ATTLIST distance
17:   units (miles | kilometers | laps) "miles"
18: >
20: <!ELEMENT location (#PCDATA)>
22: <!ELEMENT comments (#PCDATA)>

You should be able to apply what you've learned throughout this tutorial to understanding the ETML DTD. All of the elements and attributes in the DTD flow from the conceptual design that you just completed. The trainlog element (line 1) is the root element for ETML documents and contains session elements for each training session. Each session element consists of duration, distance, location, and comments child elements (line 3) and date, type, and heartrate attributes (lines 47). Notice that the type attribute of the session element (line 6) and the units attributes of the duration and distance elements (lines 12 and 17) are constrained to lists of enumerated values.

Of course, no DTD is really complete without an XML document to demonstrate its usefulness. Listing 3.4 shows a sample document that is coded in ETML.

Listing 3.4. The Training Log Sample ETML Document
 1: <?xml version="1.0"?>
 2: <!DOCTYPE trainlog SYSTEM "etml.dtd">
 4: <trainlog>
 5:   <session date="11/19/05" type="running" heartrate="158">
 6:     <duration units="minutes">50</duration>
 7:     <distance units="miles">5.5</distance>
 8:     <location>Warner Park</location>
 9:     <comments>Mid-morning run, a little winded throughout.</comments>
10:   </session>
12:   <session date="11/21/05" type="cycling" heartrate="153">
13:     <duration units="hours">1.5</duration>
14:     <distance units="miles">26.4</distance>
15:     <location>Natchez Trace Parkway</location>
16:     <comments>Hilly ride, felt strong as an ox.</comments>
17:   </session>
19:   <session date="11/24/05" type="running" heartrate="156">
20:     <duration units="hours">2.5</duration>
21:     <distance units="miles">16.8</distance>
22:     <location>Warner Park</location>
23:     <comments>Afternoon run, felt reasonably strong.</comments>
24:   </session>
25: </trainlog>

As you can see, this document strictly adheres to the ETML DTD both in terms of the elements it defines as well as the nesting of the elements. The DTD is specified in the document type declaration, which clearly references the file etml.dtd (line 2). Another couple of aspects of the document to pay attention to are the type and units attributes (lines 5, 12, and 19), which adhere to the lists of available choices defined in the DTD. Keep in mind that even though only three training sessions are included in the document, the DTD allows you to include as many as you want. So if you're feeling energetic, go sign up for a marathon and start logging away training sessions in your new markup language!

Document Validation Revisited shows you how to validate an XML document against a DTD.