XML

No international support

The Internet has created a global community and made the world a much smaller place. Corporations are expanding their businesses into this global marketplace, and they are extending their partners and corporations around the globe, linking everything through the Internet. A few proposals to create an international HTML standard have been put forward, but no standard has actually materialized. There are no HTML tags that can identify what language an HTML document is written in.

Inadequate linking system

When you create HTML documents, links are hard-coded into the document. If a link changes, the Web developer must search through all the HTML documents to find all references to the link and then update them. With Web sites that are dynamic and constantly evolving and growing to meet the needs of the users, this lack of a linking system can create substantial problems. We need a much more sophisticated method of linking documents than can be provided by HTML. HTML does not allow you to associate links to any element, nor does it allow you to link to multiple locations, whereas the linking system in XML does provide these features. In Chapter 6, you will learn more about XML's linking capability.

Faulty structure and data storage

HTML does have a structure, but this structure is not extremely rigid. For example, you can place heading 3 (<h3>) tags before heading 1 (<h1>) tags. Within the <body> tag, you can place any legitimate tag anywhere you want. You can validate HTML documents, but this validation only confirms that you have used the tags properly. Even worse, if you leave off end tags, the browser will try to figure out where the end tags should be and add them in. Thus, you can create HTML code that is not properly written but will still be interpreted properly by the browser.

Another problem arises if you try to put data into an HTML document. You will find it very difficult to do so. For example, suppose we are trying to put information from a database into an HTML document. We have a database table named Customer with the following fields: customerID, customerName, and customerAddress. When we create an HTML document with this data, every customer should have a customerID and a customerName value. The customerAddress value is optional. We could present this data in HTML in a table, as follows:

  <body>
  <table border="1" width="100%">
      <tr>
          <th width="33%">Name</th>
          <th width="33%">Address</th>
          <th width="34%">ID</th>
      </tr>
      <tr>
          <td width="33%">John Smith</td>
          <td width="33%">125 Main St. Anytown NY 10001</td>
          <td width="34%">001</td>
      </tr>
      <tr>
          <td width="33%">Jane Doe</td>
          <td width="33%">2 Main St. Anytown NY 10001</td>
          <td width="34%">002</td>
      </tr>
      <tr>
          <td width="33%">Mark Jones</td>
          <td width="33%">35 Main St. Anytown NY 10001</td>
          <td width="34%"></td>
      </tr>
  </table>
  </body>

In a browser, this table would appear as shown in Figure 2-1.

Figure 2-1. Database table created using HTML.

This document is completely valid HTML code. There are no errors in the HTML code for the table; it is syntactically correct. Yet in terms of the validity of the data, the information is invalid. The third entry, Mark Jones, is missing an ID. Although it is possible to write applications that perform data validation on HTML documents, such applications are complex and inefficient. HTML was never designed for data validation.

HTML was also not designed to store data. The table is the most common way of both presenting and storing data in HTML. You can use <div> tags to create more complex structures to store data, but once again you are left with the task of writing your own data validation code.

What we need instead is something that enables us to put the data in a structured format that can be automatically validated for syntactical correctness and proper content structure. Ideally, the author of the document will want to define both the format of the document and the correct structure of the data. As you will see in Chapters 4 and 5 this is exactly what XML and DTDs do.