XML

Overview of Entities

Entities are like macros in the C programming language in that they allow you to associate a string of characters with a name. This name can then be used in either the DTD or the XML document; the XML parser will replace the name with the string of characters. All entities consist of three parts: the word ENTITY, the name of the entity (called the literal entity value), and the replacement text-that is, the string of characters that the literal entity value will be replaced with. All entities are declared in either an internal or an external DTD.

Entities come in several types, depending on where their replacement text comes from and where it will be placed. Internal entities will get their replacement text from within the DTD, inside their declaration. External entities will get their replacement text from an external file. Both internal and external entities can be broken down into general entities and parameter entities. General entities are used in XML documents, and parameter entities are used in DTDs.

Internal general entities, internal parameter entities, and external parameter entities always contain text that should be parsed. Because external general entities go within the body of a document and because you might want to insert a nontext file (such as an image) into the body of the document, external general entities can be parsed or unparsed. External parsed general entities are used to insert XML statements from external files into the XML document. External unparsed general entities are used to insert information into the XML document that is not text-based XML and should not be parsed. Thus, we have five basic entity categories: internal general entities, internal parameter entities, external parsed general entities, external unparsed general entities, and external parameter entities.

Figure 5-1 illustrates the source of the replacement text for each of the entity categories (the closed circles) and where the replacement text will go (the arrows).

Figure 5-1. Source and destination of the replacement text for the five entity categories.