XML

Types of Nodes

The XPath data model includes seven possible node types: root, element, attribute, namespace, processing instruction, comment, and text. Let's look at each of these node types in detail.

root nodes

The root node is at the root of the tree. It is the parent of the document element. As mentioned, in XPath the document element contains the entire document. The root node contains element nodes. It also contains all of the processing instructions and comments that occur in the prolog and end of the document. The prolog consists of two optional parts: the XML declaration and a DTD.

element nodes

Every element in the document has a corresponding element node. The children of an element node include other element nodes, comment nodes, processing instruction nodes, and text nodes for their content. When you view an element node, all internal and external entity references are expanded. All character references are resolved. The descendants of an element node are the children of the element and their descendants.

The value for an element node is the string that results from concatenating all the character content in all the element's descendants. The value for the root node and the document element node are the same. Element nodes are ordered according to the order of the begin tags of the elements in the document after expansion of general entities. This ordering is called document order.

An element node can have a unique identifier that is declared in the DTD as ID. No two elements can have the same value for an ID in the same document. If two elements have the same ID in the same document, the document is invalid.

attribute nodes

Each element has an associated set of attribute nodes. An attribute that is using a default value is treated the same as an attribute that has a specified value. For an optional attribute (declared as #IMPLIED) that has no default value, if there is no value specified for the attribute, there will be no node for this attribute.

Each attribute node has a name and a string value. The value can be a zero length string ("").

namespace nodes

Every element has an associated set of namespace nodes, one for each namespace prefix that is within the scope of the element and one for the default namespace if it exists. This means that there will be a namespace node for the following attributes:

  • Every attribute of the element whose name begins with xmlns;
  • Every attribute of an ancestor element whose name begins with xmlns (unless the ancestor element has been used previously);
  • The xmlns attribute, unless its value is an empty string

Each namespace node has a name, which is a string giving the prefix, and a value, which is the namespace URI.

processing instruction nodes

An XML parser ignores processing instructions, but they can be used to pass instructions to an XML application. Every processing instruction in the XML document has a corresponding processing instruction node. Currently, processing instructions located within the DTD don't have corresponding processing instruction nodes. A processing instruction node has a name, which is a string equal to the processing instruction's target, and a value, which is a string containing the characters following the target and ending before the terminating ?> characters.

comment nodes

Every comment in the XML document has a corresponding comment node. Every comment node has a value, which is a string containing the comment text.

text nodes

All character content is grouped into text nodes. Text nodes do not have preceding or following text nodes.