XML

Basic Components of an XML Document

The most basic components of an XML document are elements, attributes, and comments. To make it easier to understand how these components work in an XML document, we will look at them using Microsoft XML Notepad (https://microsoft.github.io/XmlNotepad/).

Elements

Elements are used to mark up the sections of an XML document. An XML element has the following form:

  <ElementName>Content</ElementName>

The content is contained within the XML tags.

Although XML tags usually enclose content, you can also have elements that have no content, called empty elements. In XML, an empty element can be represented as follows:

  <ElementName/>
The <ElementName/> XML notation is sometimes called a singleton. In HTML, the empty tag is represented as <ElementName></ElementName>.

In a patient record XML document, for example, PatientName, PatientAge, PatientIllness, and PatientWeight can all be elements of the XML document, as shown here:

  <PatientName>Abc Xyzh</PatientName>
  <PatientAge>108</PatientAge>
  <PatientWeight>155</PatientWeight>

This PatientName element marks the content Abc Xyzh as the patient's name, PatientAge marks the content 108 as the patient's age, and PatientWeight marks the content 155 as the patient's weight. Elements provide information about the content in the document and can be used by computer applications to identify each content section. The application can then manipulate the content sections according to the requirements of the application.

In the case of the patient record document, the content sections could be placed into fields for a new record in a patient database or presented to a user in text boxes in a Web browser. The elements will determine what fields or text boxes each content section belongs in-for example, the content marked by the PatientName element will go into the PatientName field in the database or in the txtPName text box in the Web browser. Using elements, the presentation, storage, and transfer of data can be automated.

Nesting elements

Elements can be nested. For example, if you wanted to group all the patient information under a single Patient element, you might want to rewrite the patient record example as follows:

  <Patient>
      <PatientName>Abc Xyzh</PatientName>
      <PatientAge>108</PatientAge>
      <PatientWeight>155</PatientWeight>
  </Patient>

When nesting elements, you must not overlap tags. The following construction would not be well formed because the </Patient> end tag appears between the tags of one of its nested elements:

  <Patient>
      <PatientName>Abc Xyzh</PatientName>
      <PatientAge>108</PatientAge>
      <PatientWeight>155</Patient>
  </PatientWeight>

Thus XML elements can contain other elements. However, the elements must be strictly nested: each start tag must have a corresponding end tag.

Elements naming conventions

Element names must conform to the following rules:

  • Names consist of one or more nonspace characters. If a name has only one character, that character must be a letter, either uppercase (A-Z) or lowercase (a-z).
  • A name can only begin with a letter or an underscore.
  • Beyond the first character, any character can be used, including those defined in the Unicode standard (https://unicode.org/standard/standard.html).
  • Element names are case sensitive; thus, PatientName, PATIENTNAME, and patientname are considered different elements.

For example, the following element names are well formed:

  Fred
  _Fred
  Fredd123
  FredGruß

These element names would not be considered well formed:

  Fred 123
  -Fred
  123

Here the first element name contains a space, the second begins with a dash, and the third begins with a numeral instead of a letter or an underscore.

Attributes

An attribute is a mechanism for adding descriptive information to an element. For example, in our patient record XML document, we have no idea whether the patient's weight is measured in pounds or kilograms. To indicate that PatientWeight is given in pounds, we would add a unit attribute and specify its value as LB:

  <PatientWeight unit="LB">155</PatientWeight>

Attributes can be included only in the begin tag, and like elements they are case-sensitive. Attribute values must be enclosed in double quotation marks (").

Attributes can be used with empty elements, as in the following well-formed example:

  <PatientWeight unit="LB"/>

In this case, this might mean that the patient weight is unknown or it has not yet been entered into the system.

An attribute can be declared only once in an element. Thus, the following element would not be well formed:

  <PatientWeight unit="LB" unit="KG">155</PatientWeight>

This makes sense because the weight cannot be both kilograms and pounds.

Comments

Comments are descriptions embedded in an XML document to provide additional information about the document. Comments in XML use the same syntax as HTML comments and are formatted so that they are ignored by the application processing the document, as shown here:

  <!-- Comment text -->