Join Queries

A join query is a querying technique that matches rows from two or more tables based on a join condition in a WHERE clause and outputs only those rows that meet the condition. As part of the process of converting the winestore entity-relationship model to SQL statements, we have included the attributes required in any practical join condition.

To understand which tables can be joined in the winestore database, and how the joins are processed, it is helpful to have a copy of the ER model at hand.

Beware of the Cartesian Product

Oddly, the easiest way to introduce join queries is to discuss what not to do. Consider this query, which we might intuitively, but wrongly, use to find all the wineries in a region:

SELECT winery_name,region_name FROM winery, region;

This query produces-in part-the following results:

| winery_name                   | region_name |
| Ryan Ridge Winery             | Victoria    |
| Macdonald Creek Premium Wines | Victoria    |
| Davie's                       | Victoria    |
| Porkenberger Brook Vineyard   | Victoria    |
| Rowley Hill Vineyard          | Victoria    |

The impression here is that, for example, Ryan Ridge Winery is located in the Victoria region. This might not be the case. Why? First, you can use the techniques covered so far in this chapter to check which region Ryan Ridge Winery is located in:

SELECT region_id FROM winery
  WHERE winery_name='Ryan Ridge Winery';

The result is region_id=2.

Now query the region table to find the name of region_id=2 using:

SELECT region_name FROM region
  WHERE region_id=2;

The region_name is South Australia. So, Ryan Ridge Winery isn't in Victoria at all!

What happened in the first attempt at a join query? The technical answer is that you just evaluated a cartesian product; that is, you produced as output all the possible combinations of wineries and regions. These odd results can be seen if you add an ORDER BY clause to the original query:

SELECT winery_name, region_name FROM winery, region
  ORDER BY winery_name, region_name;

Recall that the ORDER BY clause sorts the results after the query has been evaluated; it has no effect on which rows are returned from the query. Here is the first part of the result of the query with the ORDER BY clause:

| winery_name          | region_name       |
| Anderson Creek Wines | New South Wales   |
| Anderson Creek Wines | South Australia   |
| Anderson Creek Wines | Victoria          |
| Anderson Creek Wines | Western Australia |
| Anderson Group       | New South Wales   |
| Anderson Group       | South Australia   |
| Anderson Group       | Victoria          |
| Anderson Group       | Western Australia |

The query produces all possible combinations of the four region names and 300 wineries in the sample database! In fact, the size of the output can be accurately calculated as the total number of rows in the first table multiplied by the total rows in the second table. In this case, the output is 4 x 300 = 1,200 rows.