XML

One or More Elements

Instead of using the ANY declaration for the html element, you should define the content so that the html element can be validated. The following is a declaration that specifies the content of the html element and is the same as the one given by XML Authority:

  <!ELEMENT html  (head, body)>

This (head, body) declaration signifies that the html element will have two child elements: head and body. You can list one child element within the parentheses or as many child elements as are required. You must separate each child element in your declaration with a comma.

For the XML document to be valid, the order in which the child elements are declared must match the order of the elements in the XML document. The comma that separates each child element is interpreted as followed by; therefore, the preceding declaration tells us that the html element will have a head child element followed by a body child element. Building on the preceding declaration, the following is valid XML:

  <html><head></head><body/></html>

However, the following statement would not be valid:

  <html><body></body><head/></html>

This statement indicates that the html element must contain two child elements-the first is body and the second is head-and there can only be one instance of each element.

The following two statements would also be invalid:

  <html><body></body></html>
  <html><head/><body/><head/><body/></html>

The first statement is missing the head element, and in the second statement the head and body elements are listed twice.

Reoccurrence

You will want every html element to include one head and one body child element, in the order listed. Other elements, such as the body and table elements, will have child elements that might be included multiple times within the main element or might not be included at all. XML provides three markers that can be used to indicate the reoccurrence of a child element, as shown in the following table:

XML Element Markers

Marker Meaning
? The element either does not appear or can appear only once (0 or 1).
+ the element must appear at least once (1 or more).
* The element can appear any number of times, or it might not appear at all (0 or more).

Putting no marker after the child element indicates that the element must be included and that it can appear only one time.

The head element contains an optional base child element. To declare this element as optional, modify the preceding declaration as follows:

  <!ELEMENT head  (title, base?)>

The body element contains a basefont element and an a element that are also optional. In our example, the table element is a required element used to format the page, so you want to make table a required element that appears only once in the body element. You can now rewrite the Body element as follows:

  <!ELEMENT body (basefont?, a?, table)>

The table element can have as many rows as are needed to format the page but must include at least one row. The table element should now be written as follows:

  <!ELEMENT table (tr+)>

The same conditions hold true for the tr element: the row element must have at least one column, as shown here:

  <!ELEMENT tr (td+)>

The a, ul, and ol elements might not be included in the p element, or they might be included many times, as shown here:

  <!ELEMENT p (font+, img, br, a*, ul*, ol*)>

Because the br element formats text around an image, the img and br tags should always be used together.

Grouping child elements

Fortunately, XML provides a way to group elements. For example, you can rewrite the p element as follows:

  <!ELEMENT p (font*, (img, br?)*, a*, ul*, ol*)>

This declaration specifies that an img element followed by a br element appears zero or more times in the p element.

One problem remains in this declaration. As mentioned, the comma separator can be interpreted as the words followed by. Thus, each p element will have font, img, br, a, ul, and ol child elements, in that order. This is not exactly what you want; instead, you want to be able to use these elements in any order and to use some elements in some paragraphs and other elements in other paragraphs. For example, you would like to be able to write the following code:

  <p>
      <font size=5>
          <b>Three Reasons to Shop Northwind Traders</b>
      </font>
      <ol>
          <li>
              <a href="Best.htm">Best Prices</a>
          </li>
          <li>
              <a href="Quality.htm">Quality</a>
          </li>
          <li>
              <a href="Service.htm">Fast Service</a>
          </li>
      </ol>
      <!--The following img element is not in the correct order.-->
      <img src="Northwind.jpg"></img>
  </p>

As you can see, the img element is not in the correct order-it should precede the ol element, since the declaration imposes a strict ordering on the elements.

NOTE
Also, numerous elements are declared but are not included (for example, ul). The missing elements are not a problem because you have declared each element with an asterisk (*), indicating that there can be zero or more of each element.

To allow a "reordering" of elements, you could rewrite the declaration as follows:

  <!ELEMENT p  (font*, (img, br?)*, a*, ul*, ol*)+>

The plus sign (+) at the very end of the declaration indicates that one or more copies of these child elements can occur within a p element.

The preceding XML code could thus be interpreted as two sets of child elements, as shown here:

  <p>
      <!--The elements that follow are the first set of
          (font*, (img, br?)*, a*, ul*, ol*) elements (missing
          the (img, br), a, and ul elements).-->
      <font  size=5>
          <b>Three Reasons to Shop Northwind Traders</b>
      </font>
      <ol>
          <li>
              <a href="Best.htm">Best Prices</a>
          </li>
          <li>
              <a href="Quality.htm">Quality</a>
          </li>
          <li>
              <a href="Service.htm">Fast Service</a>
          </li>
      </ol>
      <!--The img element that follows is a second set of
          (font*,(img, br?)*, a*, ul*, ol*) elements containing
          only an img element.-->
      <img src="Northwind.jpg"></img>
  </p>

This new declaration is better, but it still does not allow you to choose any element in any order. All of the elements have been declared as optional and yet at least one member of the group must still be included (as indicated by the plus sign at the end of the list of elements). There is another option.