XML

The Components of a Schema Data Type

In a schema, a data type has three parts: a value space, a lexical space, and a facet. The value space is the range of acceptable values for a data type. The lexical space is the set of valid literals that represent the ways in which a data type can be displayed-for example, 100 and 1.0E2 are two different literals, but both denote the same floating point value. A facet is some characteristic of the data type. A data type can have many facets, each defining one or more characteristics. Facets specify how one data type is different from other data types. Facets define the value space for the data type.

There are two kinds of facets: fundamental and constraining. Fundamental facets define the data type, and constraining facets place constraints on the data type. Examples of fundamental facets are rules specifying an order for the elements, a maximum or minimum allowable value, the finite or infinite nature of the data type, whether the instances of the data type are exact or approximate, and whether the data type is numeric. Constraining facets can include the limit on the length of a data type (number of characters for a string or number of bits for a binary data type), minimum and maximum lengths, enumerations, and patterns.

We can categorize the data types along several dimensions. First, data types can be atomic or aggregate. An atomic data type cannot be divided. An integer value or a date that is represented as a single character string is an atomic data type. If a date is presented as day, month, and year values, the date is an aggregate data type.

Data types can also be distinguished as primitive or generated. Primitive data types are not derived from any other data type; they are predefined. Generated data types are built from existing data types, called basetypes. Basetypes can be primitive or generated data types. Generated types, which will be discussed later in the chapter, can be either simple or complex data types.

Primitive data types include the following: string, Boolean, float, decimal, double, timeDuration, recurringDuration, binary, and uri. In addition, there is also the timeInstant data type that is derived from the recurringDuration data type. Among these primitive data types, two of them are specific to XML schemas: timeDuration, and recurringDuration. The timeInstant data type is also specific to XML. Let's have a look at them here.

The timeInstant data type represents a combination of date and time values that represent a specific instance of time. The pattern is shown here:

  CCYY-MM-DDThh:mm:ss.sss

CC represents the century, YY is the year, MM is the month, and DD is the day, preceded by an optional leading sign to indicate a negative number. If the sign is omitted, a plus sign (+) is assumed. The letter T is the date/time separator, and hh, mm, and ss.sss represent the hour, minute, and second values. Additional digits can be used to increase the precision of fractional seconds if desired. To accommodate year values greater than 9999, digits can be added to the left of this representation.

The timeInstant representation can be immediately followed by a Z to indicate the Universal Time Coordinate (UTC). The time zone information is represented by the difference between the local time and UTC and is specified immediately following the time and consists of a plus or minus sign (+ or -) followed by hh:mm.

The timeDuration data type represents some duration of time. The pattern for timeDuration is shown here:

  PyYmMdDThHmMsS

Y represents the number of years, M is the number of months, D is the number of days, T is the date/time separator, H is the number of hours, M is the number of minutes, and S is the number of seconds. The P at the beginning indicates that this pattern represents a time period. The number of seconds can include decimal digits to arbitrary precision. An optional preceding minus sign is allowed to indicate a negative duration. If the sign is omitted, a positive duration is assumed.

The recurringDuration data type represents a moment in time that recurs. The pattern for recurringDuration is the left-truncated representation for timeInstant. For example, if the CC century value is omitted from the timeInstant representation, that timeInstant recurs every hundred years. Similarly, if CCYY is omitted, the timeInstant recurs every year.

Every two-character unit of the representation that is omitted is indicated by a single hyphen (-). For example, to indicate 1:20 P.M. on May 31 of every year for Eastern Standard Time that is 5 hours behind UTC, you would write the following code:

  --05-31T13:20:00-05:00