The !DOCTYPE statement is used to declare a DTD. For an internal DTD, called an internal subset, you can use the following syntax:
<!DOCTYPE DocName [ DTD ]>
The new XML document that combines Help.htm and the DTD would look like this:
<!DOCTYPE HTML [ <!ELEMENT html (head, body)> <!ELEMENT head (title, base?)> <!ELEMENT title (#PCDATA)> <!ELEMENT base EMPTY> <!ATTLIST base target CDATA #REQUIRED> <!ELEMENT body (basefont?, a?, table)> <!ATTLIST body alink CDATA #IMPLIED text CDATA #IMPLIED bgcolor CDATA #IMPLIED link CDATA #IMPLIED vlink CDATA #IMPLIED> <!ELEMENT basefont EMPTY> <!ATTLIST basefont size CDATA #REQUIRED> <!ELEMENT a (#PCDATA)> <!ATTLIST a linkid ID #IMPLIED href CDATA #IMPLIED name CDATA #IMPLIED target CDATA #IMPLIED> <!ELEMENT table (tr+)> <!ATTLIST table width CDATA #IMPLIED rules CDATA #IMPLIED frame CDATA #IMPLIED align CDATA 'Center' cellpadding CDATA '0' border CDATA '0' cellspacing CDATA '0'> <!ELEMENT tr (td+)> <!ATTLIST tr bgcolor (Cyan | Lime | Black | White | Maroon) 'White' valign (Top | Middle | Bottom) 'Middle' align (Left | Right | Center) 'Center'> <!ELEMENT td (CellContent)> <!ATTLIST td bgcolor (Cyan | Lime | Black | White | Maroon) 'White' valign (Top | Middle | Bottom) 'Middle' align (Left | Right | Center) 'Center' rowspan CDATA #IMPLIED colspan CDATA #IMPLIED> <!ELEMENT CellContent (h1? | p?)+> <!ATTLIST CellContent cellname CDATA #REQUIRED> <!ELEMENT h1 (#PCDATA)> <!ATTLIST h1 align CDATA #IMPLIED> <!ELEMENT ImageLink (img, br?)> <!ELEMENT p (#PCDATA | font | ImageLink | a | ul | ol)+> <!ATTLIST p align CDATA #IMPLIED> <!ELEMENT font (#PCDATA | b)*> <!ATTLIST font color (Cyan | Lime | Black | White | Maroon) 'Black' face ('Times New Roman '| Arial)#REQUIRED size (2 | 3 | 4 | 5 | 6) '3'> <!ELEMENT b (#PCDATA)> <!ELEMENT img EMPTY> <!ATTLIST img width CDATA #IMPLIED height CDATA #IMPLIED hspace CDATA #IMPLIED vspace CDATA #IMPLIED src CDATA #IMPLIED alt CDATA #IMPLIED align CDATA #IMPLIED border CDATA #IMPLIED lowsrc CDATA #IMPLIED> <!ELEMENT br EMPTY> <!ATTLIST br clear CDATA #REQUIRED> <!ELEMENT ul (font?, li+)> <!ATTLIST ul type CDATA #IMPLIED> <!ELEMENT li (font? | a?)+> <!ELEMENT ol (font?, li+)> <!ATTLIST ol type CDATA #REQUIRED start CDATA #REQUIRED> ]> <html> <head> <title>Northwind Traders Help Desk</title> <base target=""><!--Default link for page--></base> </head> <body text="#000000" bgcolor="#FFFFFF" link="#003399" alink="#FF9933" vlink="#996633"> <!--Default display colors for entire body--> <a name="Top"><!--Anchor for top of page--></a> <table border="0" frame="" rules="" width="100%" align="" cellspacing="0" cellpadding="0"> <!--Rules/frame is used with border--> <tr valign="Center"> <td rowspan="" colspan="2" align="Center"> <!--Either rowspan or colspan can be used, but not both--> <!--Valign: top, bottom, middle--> <CellContent cellname="Table Header"> <h1 align="Center">Help Desk</h1> </CellContent> </td> </tr> <tr valign="Top"> <td rowspan="" colspan="" align="Left"> <CellContent cellname="Help Topic List"> <p align=""> <ul type=""> <font face="" color="" size="3"> <b>For First-Time Visitors</b> </font> <li> <a href="FirstTimeVisitorInfo.htm" target=""> First-Time Visitor Information </a> </li> <li> <a href="SecureShopping.htm" target=""> Secure Shopping at Northwind Traders </a> </li> <li> <a href="FreqAskedQ.htm" target=""> Frequently Asked Questions </a> </li> <li> <a href="NavWeb.htm" target=""> Navigating the Web </a> </li> </ul> </p> </CellContent> </td> <td rowspan="" colspan="" align="Left"> <CellContent cellname="Shipping Links"> <p align=""> <ul type=""> <font face=""> <b>Shipping</b> </font> <li> <a href="Rates.htm" target=""> Rates </a> </li> <li> <a href="OrderCheck.htm" target=""> Checking on Your Order </a> </li> <li> <a href="Returns.htm" target=""> Returns </a> </li> </ul> </p> </CellContent> </td> </tr> </table> </body> </html>
The marked-up text has remained the same with one exception. Any element that uses an enumerated data type cannot have an attribute set to an empty string (""). For example, if a tr element does not use the align attribute, the attribute must be removed from the element. Because a default value (Center) has been assigned in the DTD for the align attribute of the tr element, the default value will be applied only when the attribute is omitted.
If you open this document in the browser, you will find that it almost works. The closing brackets (]>) belonging to the !DOCTYPE statement will appear in the browser, however, which is not acceptable. To solve this problem, save the original DTD in a file called StandardHTM.dtd, remove the empty attributes that have an enumerated data type, and reference the external file StandardHTM.dtd in the new file named HelpHTM.htm. The format for a reference to an external DTD is as follows:
<!DOCTYPE RootElementName SYSTEM|PUBLIC [Name]DTD-URI>
RootElementName is the name of the root element (in this example, html). The SYSTEM keyword is needed when you are using an unpublished DTD. If a DTD has to be published and given a name, the PUBLIC keyword can be used. If the parser cannot identify the name, the DTD-URI will be used. You must specify the location of the Uniform Resource Identifier (URI) of the DTD in the DTD-URI. A URI is a general type of system identifier. One type of URI is the Uniform Resource Locator (URL) you're familiar with from the Internet.
For our example, we would need to add the following line of code to the beginning of the document HelpHTM.htm:
<!DOCTYPE html SYSTEM "StandardHTM.dtd">
A browser that does not understand XML will ignore this statement. Thus, by using an external DTD, you not only have an XML document that can be validated, but also one that can be displayed in any browser.
Summary
You now know how to build a DTD to define a set of rules that can be used to validate an XML document. Using DTDs, a standard set of rules can be developed that can be used to create standard XML documents. These documents can be exchanged between corporations or internally within a corporation and validated using the DTD. The DTD can also be used to create standard documents within a group, such as a group that is building an e-commerce site.
In Chapter 5, we'll look at entities. Entities enable you to create reusable strings within a DTD.