A Quick Relational Database Primer

Before you can learn about relating XML to databases, you need to learn about databases themselves. When most people think of databases, they're thinking specifically about relational databases. All of the popular database productsMicrosoft SQL Server, Oracle, IBM DB2, MySQLuse the relational model. In turn, most web and business applications use one relational database or another for data storage.

The relational database model is all about tables. All of the data is stored in a tabular format, and relationships between tables are expressed through data shared among those tables. Tables in relational databases are just like tables in HTML. They consist of rows and columns. Each row represents a record in the database, and each column represents one field in each of the records.

A group of tables is generally referred to as a schema, which conceptually isn't all that different from an XML schema. In a schema, some or all of the tables are generally related to one another. Let's look at how those relationships work. Ordinarily, every table contains a column (or group of columns) that contains data that uniquely identifies that row in the table. In most cases, this is an ID field that simply contains a number that sets that row apart from the others in the table. This value is referred to as the primary key. In relational database design, the primary key is extremely important because it is the root of relationships between tables.

Here's a simple example. Let's say you have a table called students. The students table contains, among other bits of data, a column called id_students. The table might also include the student's name, address, and phone number. You might also have a second table, called majors. This table contains the major and minor for all of the students, under the assumption that no student has more than one major or minor.

This is what is referred to as a one-to-one relationship. Each record in the students table can have one corresponding row in the majors table. There are two other types of relationships between tablesone-to-many and many-to-many. The students table contains a column called id_students, which serves as its primary key. The majors table should contain a column that contains student IDs. This is referred to as a foreign key, because it's a reference to a primary key in another table. The foreign key is used to implement the one-to-one relationship between the records in the two tables.

In a one-to-many relationship, a record in one table can have a reference to many records in a second table, but each record in the second table can have a reference to only one record in the first table. Here's an example: Let's say I create a table called grades, which contains a column for student IDs as well as columns for class names and the grades themselves. Because a student can take multiple classes, but each grade applies to only one student, the relationship between students and grades is a one-to-many relationship. In this case, id_students in the grades table is a foreign key relating to the students table.

An example of a many-to-many relationship is the relationship between students and classes. Each student is usually enrolled in several classes, and each class usually contains multiple students. In a relational database, such a relationship is expressed using what is sometimes referred to as a joining tablea table that exists solely to express the relationship between two pieces of data. The schema contains two tables, students and classes. You already know about the students table; the classes table contains information about the classes offeredthe name of the professor, the room where the class is held, and the time at which the class is scheduled.

Before you can deal with integrating databases and XML, you need to understand both databases and XML. You've been learning about XML for a while now, so consider this a crash course in database theory.

To relate students to classes, you need a third table, called classes_students (or a similarly descriptive name). At a bare minimum, this table must include two columns, id_students and id_classes, both of which are foreign keys pointing to the students and classes tables, respectively. These two columns are used to express the many-to-many relationship. In other words, both of the other two tables have one-to-many relationships with this table. Using this table, each student can be associated with several classes, and each class can be associated with any number of students. It may also contain properties that are specific to the relationship, rather than to either a student or a class specifically. For example, a student's grade or her attendance record for the class might be stored in this table. This table structure is illustrated in Figure 19.1.

Figure 19.1. The tables in a many-to-many relationship.