Hi All:
OK, part 3.
Here's another term to go with the theta join. An _equijoin_ joins on
matching values and displays the join's seam -- the connecting column
from each table that participates in the join.
The following query joins cities and automobiles by individual:
SELECT Cars.name, ;
Cars.auto, ;
City.name, ;
City.city ;
FROM Cars A, ;
City C ;
WHERE Cars.name = City.name
INTO CURSOR CarCity
A_Name Auto C_Name City
--------------- ------------ --------------- ---------------
David Powell Porsche David Powell Dallas
Peter Doubt Pinto Peter Doubt Albany
Natalie Drest Ferrari Natalie Drest Madison
Warren Peese Nova Warren Peese Nova
Since there's no need to display information redundantly, we can eliminate
the display of one of the connecting columns by restating the query so that
it displays the join column only once. Such a display is called a _natural
join_. The natural join version would resemble the following:
SELECT Cars.name, ;
Cars.auto, ;
City.city ;
FROM Cars A, ;
City C ;
WHERE Cars.name = City.name ;
INTO CURSOR CarCity
Name Auto City
--------------- ------------ ---------------
David Powell Porsche Dallas
Peter Doubt Pinto Albany
Natalie Drest Ferrari Madison
Warren Peese Nova Nova
Now we have a natural join, because the column NAME appears only once in
the results. It doesn't matter which NAME column we include, but we must
qualify it in the select list with its table name. The natural join is
only one variant on the equijoin: in addition, we have all the joins
based on various kinds of inequality, and (as mentioned) the maybe-joins
that compare known values to null, or unknown, values. As you might
have experimented on your own, Fox will join NULLs rendering unreliable
results. To avoid this, include AND != NULL or ! EMPT().
I won't belabour the inequalities here. The same rules apply.
The _self-join_ is another variant on the equijoin. A self-join compares
values within a column of a single table. "Now why would anyone ever do
that?" you ask. You might wish to know how many FoxPro developers in TX
have the same zip code.
Since this query involves a join of the Develop table with itself, the
Develop table appears in two roles. You must distinguish these roles by
giving the Develop table two different aliases, i.e., dev1 and dev2, in
the FROM clause's table list. These aliases are also used to qualify the
column names in the rest of the query. Here's one version of this query:
SELECT Dev1.fname, ;
Dev1.lname, ;
Dev1.state, ;
Dev1.zip ;
FROM Develop Dev1, ;
Develop Dev2 ;
WHERE Dev1.state = "TX" ;
AND Dev1.zip = Dev2.zip ;
INTO CURSOR DevZip
Fname Lname State Zip
---------- ----------- ----- -------
David Powell TX 75075
Phil Errup TX 75075
Karn E. Assada TX 76016
Ilene Dover TX 75012
Ben Dover TX 75012
Ilene Dover TX 75012
Ben Dover TX 75012
Willie Maykit TX 75222
David Powell TX 75075
Phil Errup TX 75075
These results are a bit hard to read, and don't appear to be correct.
In order to clarify the situation, let's first attempt to remove duplicate
rows. We have four duplicates above: David, Phil, Ben and Ilene. To
eliminate duplicate rows, use the keyword DISTINCT to modify the select
list.
SELECT DISTINCT Dev1.fname, ;
Dev1.lname, ;
Dev1.state, ;
Dev1.zip ;
FROM Develop Dev1, ;
Develop Dev2 ;
WHERE Dev1.state = "TX" ;
AND Dev1.zip = Dev2.zip ;
INTO CURSOR DevZip
Fname Lname State Zip
---------- ----------- ----- -------
Ben Dover TX 75012
David Powell TX 75075
Ilene Dover TX 75012
Karn E. Assada TX 76016
Phil Errup TX 75075
Willie Maykit TX 75222
Now the results are free of duplicates, but they show all 6 developers
who live in Texas, not just those who share a zip code! What's going
on? (Note how it alphabetized the first column, too.)
OK. The self-join compares all the zip code values to themselves, so
that each TX developer's zip code automatically matches his or her own
zip code. There, all the Texans appear in the results. How do we
eliminate those developers who only match themselves? Let's add one
more condition to the WHERE clause:
SELECT DISTINCT Dev1.fname, ;
Dev1.lname, ;
Dev1.state, ;
Dev1.zip ;
FROM Develop Dev1, ;
Develop Dev2 ;
WHERE Dev1.state = "TX" ;
AND Dev1.zip = Dev2.zip ;
AND Dev1.fname + Dev1.lname != Dev2.fname + Dev2.lname ;
INTO CURSOR DevZip
Fname Lname State Zip
---------- ----------- ----- -------
Ben Dover TX 75012
David Powell TX 75075
Ilene Dover TX 75012
Phil Errup TX 75075
Now to make it even easier to read, let's add an ORDER BY clause for zip:
SELECT DISTINCT Dev1.fname, ;
Dev1.lname, ;
Dev1.state, ;
Dev1.zip ;
FROM Develop Dev1, ;
Develop Dev2 ;
WHERE Dev1.state = "TX" ;
AND Dev1.zip = Dev2.zip ;
AND Dev1.fname + Dev1.lname != Dev2.fname + Dev2.lname ;
ORDER BY 4 ;
INTO CURSOR DevZip
Fname Lname State Zip
---------- ----------- ----- -------
Ben Dover TX 75012
Ilene Dover TX 75012
David Powell TX 75075
Phil Errup TX 75075
That looks better. In the first, unrestrained example the duplicate
rows for these four developers resulted from each developer's match
with self and with the other qualifying developers. Eliminating
duplicates could be important to an analysis of the validity of the
results, in cases where you really want to know "How many?" as well
as "Who?". And this is found in the system variable, _TALLY.
Isn't this a beautiful language? I love it! Comments welcome.
David in Dallas.
--- Maximus/2 3.01
---------------
* Origin: * MacSavvy OS/2 BBS * Dallas, Texas * 972-250-4479 * (1:124/1208)
|