Listing 18.1. A Sample XML Document Containing Vehicle Data
1: <?xml version="1.0"?> 2: 3: <vehicles> 4: <vehicle year="2004" make="Acura" model="3.2TL"> 5: <mileage>13495</mileage> 6: <color>green</color> 7: <price>33900</price> 8: <options> 9: <option>navigation system</option> 10: <option>heated seats</option> 11: </options> 12: </vehicle> 13: 14: <vehicle year="2005" make="Acura" model="3.2TL"> 15: <mileage>07541</mileage> 16: <color>white</color> 17: <price>33900</price> 18: <options> 19: <option>spoiler</option> 20: <option>ground effects</option> 21: </options> 22: </vehicle> 23: 24: <vehicle year="2004" make="Acura" model="3.2TL"> 25: <mileage>18753</mileage> 26: <color>white</color> 27: <price>32900</price> 28: <options /> 29: </vehicle> 30: </vehicles>
Now let's take a look at some simple XQuery queries that can be used to retrieve data from that document. The syntax for XQuery is very lean, and in fact borrows heavily from a related technology called XPath; you learn a great deal more about XPath in Addressing And Linking XML Documents, "Addressing and Linking XML Documents." As an example, the query that retrieves all of the color elements from the document is:
for $c in //color return $c
This query returns the following:
<?xml version="1.0" encoding="UTF-8"?> <color>green</color> <color>white</color> <color>white</color>
The queries are intended to be typed into an application that supports XQuery, or to be used within XQuery queries that are passed into an XQuery processor. The results of the query are displayed afterward, to show what would be returned.
This query asks to return all of the child elements named color
in the document. The //
operator is used to return elements anywhere below another element, which in this case indicates that all color
elements in the document should be returned. You could have just as easily coded this example as:
for $c in vehicles/vehicle/color return $c
The $c
in these examples serves as a variable, or placeholder, that holds the results of the query. You can think of the query results as a loop where each matching element is grabbed one after the next. In this case, all you're doing is returning the results for further processing or for writing to an XML document.
If you're familiar with the for
loop in a programming language such as BASIC, Java, or C++, the for
construct in XQuery won't be entirely foreign, even if it doesn't involve setting up a counter as in traditional for
loops.
As the previous code reveals, a /
at the beginning of a query string indicates the root level of the document structure or a relative folder level separation. For example, the query that follows wouldn't return anything because color
is not the root level element of the document.
/color
All of this node addressing syntax is technically part of XPath, which makes up a considerable part of the XQuery technology. You learn a great deal more about the ins and outs of XPath in Addressing And Linking XML Documents. As you can see, aside from a few wrinkles, requesting elements from an XML document using XQuery/XPath isn't all that different from locating files in a file system using a command shell.
In XQuery/XPath, expressions within square brackets ([]) are subqueries. Those expressions are not used to retrieve elements themselves but to qualify the elements that are retrieved. For example, a query such as
//vehicle/color
retrieves color
elements that are children of vehicle
elements. On the other hand, this query
//vehicle[color]
retrieves vehicle
elements that have a color
element as a child. Subqueries are particularly useful when you use them with filters to write very specific queries.
Querying with Wildcards
Continuing along with the vehicle code example, let's say you want to find all of the option
elements that are grandchildren of the vehicle
element. To get them all from the sample document, you could just use the query vehicles/vehicle/options/option
. However, let's say that you didn't know that the intervening element was options
or that there were other elements that could intervene between vehicle
and option
. In that case, you could use the following query:
for $o in vehicles/vehicle/*/option return $o
Following are the results of this query:
<?xml version="1.0" encoding="UTF-8"?> <option>navigation system</option> <option>heated seats</option> <option>spoiler</option> <option>ground effects</option>
The wildcard (*
) matches any element. You can also use it at the end of a query to match all the children of a particular element.
Using Filters to Search for Specific Information
After you've mastered the extraction of specific elements from XML files, you can move on to searching for elements that contain information you specify. Let's say you want to find higher-level elements containing a particular value in a child element. The []
operator indicates that the expression within the square braces should be searched but that the element listed to the left of the square braces should be returned. For example, the following expression would read "return any vehicle
elements that contain a color
element with a value of green
:
for $v in //vehicle[color='green'] return $v
Here are the results:
<?xml version="1.0" encoding="UTF-8"?> <vehicle year="2004" make="Acura" model="3.2TL"> <mileage>13495</mileage> <color>green</color> <price>33900</price> <options> <option>navigation system</option> <option>heated seats</option> </options> </vehicle>
The full vehicle
element is returned because it appears to the left of the search expression enclosed in the square braces. You can also use Boolean operators such as and and or
to string multiple search expressions together. For example, to find all of the vehicles with a color
of green
or a price less than 34000
, you would use the following query:
for $v in //vehicle[color='green' or price<'34000'] return $v
This query results in the following:
<?xml version="1.0" encoding="UTF-8"?> <vehicle year="2004" make="Acura" model="3.2TL"> <mileage>13495</mileage> <color>green</color> <price>33900</price> <options> <option>navigation system</option> <option>heated seats</option> </options> </vehicle> <vehicle year="2004" make="Acura" model="3.2TL"> <mileage>18753</mileage> <color>white</color> <price>32900</price> <options/> </vehicle>
The !=
operator is also available when you want to write expressions to test for inequality. Additionally, there are actually three common Boolean operators: and
, or
, and not
. For example, you can combine these operators to write complex queries, such as this:
for $v in //vehicle[not(color='blue' or color='green') and @year='2004'] return $v
This example is a little more interesting in that it looks for vehicles that aren't blue or green but that are in the model year 2004. Following are the results:
<?xml version="1.0" encoding="UTF-8"?> <vehicle year="2004" make="Acura" model="3.2TL"> <mileage>18753</mileage> <color>white</color> <price>32900</price> <options/> </vehicle>
You might be wondering about the at symbol (@
) in front of the year
in the query. If you recall from the vehicles sample document (Listing 18.1), year
is an attribute, not a child element@
is used to reference attributes in XQuery. More on attributes in a moment.
Just to make sure you understand subqueries, what if you wanted to retrieve just the options for any white cars in the document? Here's the query:
//vehicle[color='white']/options
And here's the result:
<?xml version="1.0" encoding="UTF-8"?> <options> <option>spoiler</option> <option>ground effects</option> </options> <options/>
Let's break down that query. Remember that //
means "anywhere in the hierarchy." The //vehicle
part indicates that you're looking for elements inside a vehicle
element. The [color='white']
part indicates that you're interested only in vehicle
elements containing color
elements with a value of white
. The part you haven't yet seen is /options
. This indicates that the results should be any options
elements under vehicle
elements that contain a color
element matching white
.
Referencing Attributes
The next thing to look at is attributes. When you want to refer to an attribute, place an @
sign before its name. So, to find all the year
attributes of vehicle
elements, use the following query:
//vehicle/@year
You can write a slightly different query that returns all of the vehicle
elements that have year
attributes as well:
//vehicle[@year]
This naturally leads up to writing a query that returns all the vehicle
elements that have a year
attribute with a certain value, say 2005
. That complete query is
for $v in //vehicle[@year="2005"] return $v