XML

Working with Simple Types

XSD includes several different simple data types, or simple types, that make it possible to model a wide range of data in XML documents. These types can be classified according to the kind of data they represent. Following are the major categories of simple data types supported in the XSD language, along with the specific XSD elements associated with each category:

  • String types xsd:string

  • Boolean types xsd:boolean

  • Number types xsd:integer, xsd:decimal, xsd:float, xsd:double

  • Date and time types xsd:time, xsd:timeInstant, xsd:duration, xsd:date, xsd:month, xsd:year, xsd:century, xsd:recurringDate, xsd:recurringDay

  • Custom types xsd:simpleType

These simple types are typically used to create elements and attributes in a schema document. In order to create an element based upon a simple type, you must use the xsd:element element, which has two primary attributes used to describe the element: name and type. The name attribute is used to set the element name, which is the name that appears within angle brackets (<>) when you use the element in XML code. The type attribute determines the type of the element and can be set to a simple or complex type. Following are the element examples you saw a little earlier in the tutorial that make use of the xsd:string simple type:

<xsd:element name="name" type="xsd:string"/>
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="occupation" type="xsd:string"/>

Attributes are created in much the same manner as elements and even rely on the same two attributes, name and type. However, you create an attribute using the xsd:attribute element. Following are the attribute examples you saw earlier that use the xsd:date and xsd:integer simple types:

<xsd:attribute name="birthdate" type="xsd:date"/>
<xsd:attribute name="weight" type="xsd:integer"/>

Now that you understand how simple types enter the picture with elements and attributes, you're ready to learn more about the types themselves.

The String Type

The string type represents a string of text and is represented in the type attribute by the xsd:string value. The string type is probably the most commonly used type in XSD. Following is an example of how to use the xsd:string value to create a string element:

<xsd:element name="name" type="xsd:string"/>

In an XML document, this element might be used like this:

<name>Milton James</name>

The Boolean Type

The Boolean type represents a true/false or yes/no value and is represented in the type attribute by the xsd:boolean value. When using a Boolean type in an XML document, you can set it to true or false, or 1 or 0, respectively. Following is an example of an attribute that is a Boolean type:

<xsd:attribute name="retired" type="xsd:boolean"/>

In an XML document, this attribute might be used like this:

<person retired="false">
  <name>Milton James</name>
</person>

Number Types

Number types are used in XSD to describe elements or attributes with numeric values. The following number types are available for use in schemas to represent numeric information:

  • xsd:integer Integer numbers (with no fractional part); for example, 3

  • xsd:decimal Decimal numbers (with a fractional part); for example, 3.14

  • xsd:float Single precision (32-bit) floating point numbers; for example, 6.022E23

  • xsd:double Double precision (64-bit) floating point numbers; same as float but for considerably more precise numbers

By the Way

If you'd like to exert exacting control over the sign of integer numbers, you might consider using one of these additional numeric types: xsd:positiveInteger, xsd:negativeInteger, xsd:nonPositiveInteger, or xsd:nonNegativeInteger. The latter two types are zero-inclusive, whereas the first two don't include zero.


To create an element or attribute for a numeric piece of information, you simply select the appropriate number type in the XSD. Following is an example of a couple of attributes that are number types:

<xsd:attribute name="height" type="xsd:decimal"/>
<xsd:attribute name="weight" type="xsd:integer"/>

In an XML document, this attribute might be used like this:

<person height="5.75" weight="160">
  <name>Milton James</name>
</person>

Date and Time Types

XSD includes support for date and time types, which is very useful when it comes to modeling such information. Following are the different date and time types that are supported in XSD:

  • xsd:time A time of day; for example, 4:40 p.m.

  • xsd:timeInstant An instant in time; for example, 4:40 p.m. on August 24, 1970

  • xsd:duration A length of time; for example, 3 hourss and 15 minutes

  • xsd:date A day in time; for example, August 24, 1970

  • xsd:month A month in time; for example, August, 1970

  • xsd:year A year in time; for example, 1970

  • xsd:century A century; for example, 20th century

  • xsd:recurringDate A date without regard for the year; for example, August 24

  • xsd:recurringDay A day of the month without regard for the month or year; for example, the 24th of the month

To create an element or attribute for a date or time, you must select the appropriate date or time type in the XSD. Following is an example of an attribute that is a date type:

<xsd:attribute name="birthdate" type="xsd:date"/>

This attribute is of type xsd:date, which means that it can be used in XML documents to store a day in time, such as October 28, 1969. You don't just set the birthdate attribute to October 28, 1969, however. Dates and times are actually considered highly formatted pieces of information, so you must enter them according to predefined formats set forth by the XSD language. The format for the xsd:date type is ccyymmdd, where cc is the century (19), yy is the year (69), mm is the month (10), and dd is the day (28). The following code shows how you would specify this date in the birthdate attribute using the CCYY-MM-DD format:

<person birthdate="1969-10-28" height="5.75" weight="160">
  <name>Milton James</name>
</person>

Other date and time types use similar formats. For example, the xsd:month type uses the format ccyymm, xsd:year uses ccyy, and xsd:century uses the succinct format cc. The xsd:recurringDate type uses mm-dd to format recurring dates, whereas the xsd:recurringDay type uses---dd. Following is an example of the xsd:recurringDate type so that you can see how the dashes fit into things:

<person birthday="10--28" height="5.75" weight="160">
  <name>Milton James</name>
</person>

In this example, an attribute named birthday is used instead of birthdate, with the idea being that a birthday is simply a day and month without a birth year (a birth date implies a specific year). Notice that an extra dash appears at the beginning of the birthday attribute value to serve as a placeholder for the intentionally missing year.

The remaining time types are xsd:duration, xsd:time, and xsd:timeInstant. The xsd:duration type uses an interesting format to represent a length of timeto specify a value of type xsd:duration you must enter the length of time according to the format PyyYmmMddDThhHmmMssS. The P in the format indicates the period portion of the value, which consists of the year (yy), month (mm), and day (dd). The T in the format begins the optional time portion of the value and consists of hours (hh), minutes (mm), and seconds (ss). You can precede a time duration value with a minus sign (-) to indicate that the duration of time goes in the reverse direction (back in time). Following is an example of how you would use this format to code the time duration value 3 years, 4 months, 2 days, 13 hours, 27 minutes, and 11 seconds:

<worldrecord duration="P3Y4M2DT13H27M11S">
</worldrecord>

The xsd:time type adheres to the format hh:mm:ss.sss. In addition to specifying the hours (hh), minutes (mm), and seconds (ss.sss) of the time, you may also enter a plus (+) or minus (-) sign followed by hh:mm to indicate the offset of the time from Universal Time (UTC). As an example, the U.S. Central Standard Time zone is six hours behind UTC time, so you would need to indicate that in an xsd:time value that is in Central Standard Time (CST). Following is an example of a CST time:

<meeting start="15:30:00-06:00">
</meeting>

By the Way

UTC stands for Coordinated Universal Time and is the same as Greenwich Mean Time (GMT). UTC time is set for London, England, and therefore must be adjusted for any other time zones. Other time zones are adjusted by adding or subtracting time from UTC time. For example, U.S. Pacific Standard Time (PST) is UTC 8, whereas Japan is UTC + 9.


Notice in the code that the hours in the time are entered in 24-hour form, also known as "military time," meaning that there is no a.m. or p.m. involved. The time specified in this example is 3:30 p.m. CST.

The xsd:timeInstant type follows the type ccyymmddThh:mm:ss.sss and is essentially an xsd:time type with the year, month, and day tacked on. As an example, the previous xsd:time type could be coded as a xsd:timeInstant type with the following code:

<meeting start="2002-02-23T15:30:00-06:00">
</meeting>

Custom Types

One of the neatest things about XSD is how it allows you to cook up your own custom data types. Custom data types allow you to refine simple data types to meet your own needs. For example, you can limit the range of numbers for a number type, or constrain a string type to a list of possible strings. Regardless of how you customize a type, you always begin with the xsd:simpleType element, which is used to create custom simple types. Most of the time your custom types will represent a constraint of a simple type, in which case you'll also need to use the xsd:restriction element. The restriction element supports a type named base that refers to the base type you are customizing. Following is the general structure of a custom simple type:

<xsd:simpleType name="onetotenType">
  <xsd:restriction base="xsd:integer">
  </xsd:restriction>
</xsd:simpleType>

This code merely sets up the type to be created; the actual restrictions on the custom type are identified using one of several different elements. To constrain the range of values a number may have, you use one of the following elements:

  • xsd:minInclusive Minimum number allowed

  • xsd:minExclusive One less than the minimum number allowed

  • xsd:maxInclusive The maximum number allowed

  • xsd:maxExclusive One greater than the maximum number allowed

These types allow you to set lower and upper ranges on numeric values. Following is an example of how you would limit a numeric value to a range of 1 to 10:

<xsd:simpleType name="onetotenType">
  <xsd:restriction base="xsd:integer">
    <xsd:minInclusive value="1"/>
    <xsd:maxInclusive value="10"/>
  </xsd:restriction>
</xsd:simpleType>

It's important to note that this code only establishes a custom type named onetotenType; it doesn't actually create an element or attribute of that type. In order to create an element or attribute of a custom type, you must specify the type name in the type attribute of the xsd:element or xsd:attribute element:

<xsd:element name="rating" type="onetotenType">

Although this approach works fine, if you plan on using a custom type with only a single element or attribute, you may want to declare the type directly within the element or attribute, like this:

<xsd:element name="rating">
  <xsd:simpleType>
    <xsd:restriction base="xsd:integer">
      <xsd:minInclusive value="1"/>
      <xsd:maxInclusive value="10"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:element>

In addition to controlling the bounds of simple types, it is also possible to control the length of them. For example, you might want to limit the size of a string of text. To do so, you would use one of the following elements:

  • xsd:length The exact number of characters

  • xsd:minlength The minimum number of characters

  • xsd:maxlength The maximum number of characters

Because the xsd:length element specifies the exact length, you can't use it with the xsd:minlength or xsd:maxlength elements. However, you can use the xsd:minlength and xsd:maxlength elements together to set the bounds of a string's length. Following is an example of how you might control the length of a string type:

<xsd:element name="password">
  <xsd:simpleType>
    <xsd:restriction base="xsd:string">
      <xsd:minLength value="8"/>
      <xsd:maxLength value="12"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:element>

In this example, a password element is created that must have at least 8 characters but no more than 12. This shows how to control the length of strings, but it is also possible to control the length of numbers. More specifically, you can use the xsd:precision and xsd:scale elements to control how many digits appear to the left or right of a decimal point; this is known as the precision of a number. The xsd:precision element determines how many total digits are allowed in a number, whereas xsd:scale determines how many of those digits appear to the right of the decimal point. So, if you wanted to allow monetary values up to $9999.00 with two decimal places, you would use the following code:

<xsd:element name="balance">
  <xsd:simpleType>
    <xsd:restriction base="xsd:decimal">
      <xsd:precision value="6"/>
      <xsd:scale value="2"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:element>

Keep in mind that the xsd:precision and xsd:scale elements set the maximum allowable number of digits for the total number and to the right of the decimal place, which means that all of the following examples are valid for the balance element:

<balance>3.14</balance>
<balance>12.95</balance>
<balance>1.1</balance>
<balance>524.78</balance>

One other customization I'd like to mention at this point has to do with default and fixed values. In the event that an element or attribute isn't specified in a document, you may want to declare a default value that is assumed. You may also want to limit an element or attribute so that it can have only one possible value, which is known as a fixed value. Default and fixed values are established with the default and fixed attributes of the xsd:element and xsd:attribute elements. Following are a few examples of default and fixed elements and attributes:

<xsd:element name="balance" type="xsd:decimal" default="0.0"/>
<xsd:element name="pi" type="xsd:decimal" fixed="3.14"/>
<xsd:attribute name="expired" type="xsd:boolean" default="false"/>
<xsd:attribute name="title" type="xsd:string" fixed="Mr."/>

The balance element has a default value of 0.0, which means it will assume this value if it isn't used in a document. The same thing goes for the expired attribute, which assumes the default value of false if it goes unused. The pi element is fixed at the value 3.14, which means if it is used it must be set to that value. Similarly, the title attribute must be set to Mr. if it is used. Notice that none of the examples are defined as having both default and fixed values; that's because you aren't allowed to define both a default and a fixed value for any single element or attribute.

In addition to customizing simple types as you've seen thus far, you can also do some other interesting things with custom types. The next few sections explore the following data types, which are considered slightly more advanced custom types:

  • Enumerated types

  • List types

  • Patterned types

Enumerated Types

Enumerated types are used to constrain the set of possible values for a simple type and can be applied to any of the simple types except the Boolean type. To create an enumerated type, you use the xsd:enumeration element to identify each of the possible values. These values are listed within an xsd:restriction element, which identifies the base type. As an example, consider an element named team that represents the name of an NHL hockey team. Following is an example of how you might code this element with the help of enumerated types:

<xsd:element name="team">
  <xsd:simpleType>
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="Nashville Predators"/>
      <xsd:enumeration value="Detroit Red Wings"/>
      <xsd:enumeration value="St. Louis Blues"/>
      <xsd:enumeration value="Chicago Blackhawks"/>
      <xsd:enumeration value="Columbus Blue Jackets"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:element>

This code obviously doesn't include every NHL team, but you get the idea. The important thing to note is that the schema won't allow an XML developer to use any value for the team element other than those listed here. So, if you were creating a fantasy hockey data service that allowed people to access hockey data on a team-by-team basis, they would only be able to choose from your predefined list of teams. Enumerated types therefore provide a very effective means of tightly defining data that is limited to a set of predefined possibilities.

List Types

Whereas enumerated types force an XML developer to use a value from a predefined set of values, list types allow an XML developer to provide multiple values for a given element. The xsd:list element is used to create list types, which are useful any time you need to allow for a list of information. As an example, you might want to create an element that stores rainfall totals for each month of the year as part of an XML-based weather application. Following is code that carries out this function:

<xsd:element name="rainfall">
  <xsd:simpleType>
    <xsd:list base="xsd:decimal">
      <xsd:length value="12"/>
    </xsd:list>
  </xsd:simpleType>
</xsd:element>

This code allows you to list exactly 12 decimal numbers, separated by white space. Following is an example of what the XML code might look like for the rainfall element:

<rainfall>1.25 2.0 3.0 4.25 3.75 1.5 0.25 0.75 1.25 1.75 2.0 2.25</rainfall>

If you wanted to be a little more flexible and not require exactly 12 items in the list, you could use the xsd:minLength and xsd:maxLength elements to set minimum and maximum bounds on the list. You can also create a completely unbounded list by using the xsd:list element by itself, like this:

<xsd:element name="cities">
  <xsd:simpleType>
    <xsd:list base="xsd:string"/>
  </xsd:simpleType>
</xsd:element>

Patterned Types

Patterned types are undoubtedly the trickiest of all custom types, but they are also the most powerful in many ways. Patterned types allow you to use a regular expression to establish a pattern that tightly controls the format of a simple type. A regular expression is a coded pattern using a special language that describes an arrangement of letters, numbers, and symbols. The regular expression language employed by XSD is fairly complex, so I won't attempt a complete examination of it. Instead, I'd like to focus on the basics and allow you to investigate it further on your own if you decide you'd like to become a regular expression guru. Getting back to patterned types, you create a patterned type using the xsd:pattern element.

The xsd:pattern element requires an attribute named value that contains the regular expression for the pattern. Following are the building blocks of a regular expression pattern:

  • . Any character

  • \d Any digit

  • \D Any nondigit

  • \s Any white space

  • \S Any nonwhite space

  • x? One x or none at all

  • x+ One or more x's

  • x* Any number of x's

  • (xy) Groups x and y together

  • x|y x or y

  • [xyz] One of x, y, or z

  • [x-y]in the range x to y

  • x{n} n number of x's in a row

  • x{n,m} At least n number of x's but no more than m

See, I told you regular expressions are kind of tricky. Actually, these regular expression symbols and patterns aren't too difficult to understand when you see them in context, so let's take a look at a few examples. First off, how about a phone number? A standard U.S. phone number including area code is of the form xxx-xxx-xxxx. In terms of patterned types and regular expressions, this results in the following code:

<xsd:element name="phonenum">
  <xsd:simpleType>
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\d\d\d-\d\d\d-\d\d\d\d"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:element>

As you can see, the phonenum element is described by a pattern that consists of sequences of digits separated by hyphens. Although this pattern works fine, it's important to note that regular expressions are extremely flexible, often offering more than one solution to a given problem. For example, the following xsd:pattern element also works for a phone number:

<xsd:pattern value="\d{3}-\d{3}-\d{4}"/>

In this example a phone number is described using curly braces to indicate how many decimal numbers can appear at each position in the pattern. The code \d{3} indicates that there should be exactly three decimal numbers, whereas \d{4} indicates exactly four decimal numbers.

Let's now consider a slightly more advanced regular expression pattern such as a pizza order. Our pizza order pattern must have the form s-c-t+t+t+, where s is the size (small, medium or large), c is the crust (thin or deep), and each t is an optional topping (sausage, pepperoni, mushroom, peppers, onions, and anchovies) in addition to cheese, which is assumed. Following is how this pizza order pattern resolves into an XSD regular expression pattern:

<xsd:element name="pizza">
  <xsd:simpleType>
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="(small|medium|large)-(thin|deep)-(sausage+)?
        (pepperoni+)?(mushroom+)?(peppers+)?(onions+)?(anchovies+)?"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:element>

Following is an example of how you might code a pizza element based upon this pattern:

<pizza>medium-deep-sausage+mushroom+</pizza>

Obviously, there is a great deal more that can be done with regular expression patterns. Hopefully this is enough information to get you going in the right direction with patterned types.