TIP: Click on subject to list as thread! ANSI
echo: foxpro
to: ALL
from: DAVID POWELL
date: 1997-04-07 18:54:00
subject: SQL: Joins (3 of 3)

Hi All:
 
OK, part 3.
 
Here's another term to go with the theta join.  An _equijoin_ joins on
matching values and displays the join's seam -- the connecting column
from each table that participates in the join.
 
The following query joins cities and automobiles by individual:
 
SELECT Cars.name, ;
       Cars.auto, ;
       City.name, ;
       City.city  ;
  FROM Cars A,    ;
       City C     ;
 WHERE Cars.name = City.name
  INTO CURSOR CarCity
 
A_Name          Auto         C_Name          City
--------------- ------------ --------------- ---------------
David Powell    Porsche      David Powell    Dallas
Peter Doubt     Pinto        Peter Doubt     Albany
Natalie Drest   Ferrari      Natalie Drest   Madison
Warren Peese    Nova         Warren Peese    Nova
 
Since there's no need to display information redundantly, we can eliminate
the display of one of the connecting columns by restating the query so that
it displays the join column only once.  Such a display is called a _natural
join_.  The natural join version would resemble the following:
 
   SELECT Cars.name, ;
          Cars.auto, ;
          City.city  ;
     FROM Cars A,    ;
          City C     ;
    WHERE Cars.name = City.name ;
     INTO CURSOR CarCity
 
Name            Auto         City
--------------- ------------ ---------------
David Powell    Porsche      Dallas
Peter Doubt     Pinto        Albany
Natalie Drest   Ferrari      Madison
Warren Peese    Nova         Nova
 
Now we have a natural join, because the column NAME appears only once in
the results.  It doesn't matter which NAME column we include, but we must
qualify it in the select list with its table name.  The natural join is
only one variant on the equijoin: in addition, we have all the joins
based on various kinds of inequality, and (as mentioned) the maybe-joins
that compare known values to null, or unknown, values.  As you might
have experimented on your own, Fox will join NULLs rendering unreliable
results.  To avoid this, include AND  != NULL or ! EMPT().
 
I won't belabour the inequalities here.  The same rules apply.
 
The _self-join_ is another variant on the equijoin.  A self-join compares
values within a column of a single table.  "Now why would anyone ever do
that?" you ask.  You might wish to know how many FoxPro developers in TX
have the same zip code.
 
Since this query involves a join of the Develop table with itself, the
Develop table appears in two roles.  You must distinguish these roles by
giving the Develop table two different aliases, i.e., dev1 and dev2, in
the FROM clause's table list.  These aliases are also used to qualify the
column names in the rest of the query.  Here's one version of this query:
 
   SELECT Dev1.fname, ;
          Dev1.lname, ;
          Dev1.state, ;
          Dev1.zip    ;
     FROM Develop Dev1,  ;
          Develop Dev2   ;
    WHERE Dev1.state = "TX" ;
      AND Dev1.zip   = Dev2.zip ;
     INTO CURSOR DevZip
 
Fname      Lname       State Zip
---------- ----------- ----- -------
David      Powell      TX    75075
Phil       Errup       TX    75075
Karn E.    Assada      TX    76016
Ilene      Dover       TX    75012
Ben        Dover       TX    75012
Ilene      Dover       TX    75012
Ben        Dover       TX    75012
Willie     Maykit      TX    75222
David      Powell      TX    75075
Phil       Errup       TX    75075
 
These results are a bit hard to read, and don't appear to be correct.
In order to clarify the situation, let's first attempt to remove duplicate
rows.  We have four duplicates above: David, Phil, Ben and Ilene.  To
eliminate duplicate rows, use the keyword DISTINCT to modify the select
list.
 
   SELECT DISTINCT Dev1.fname, ;
          Dev1.lname, ;
          Dev1.state, ;
          Dev1.zip    ;
     FROM Develop Dev1,  ;
          Develop Dev2   ;
    WHERE Dev1.state = "TX" ;
      AND Dev1.zip   = Dev2.zip ;
     INTO CURSOR DevZip
 
Fname      Lname       State Zip
---------- ----------- ----- -------
Ben        Dover       TX    75012
David      Powell      TX    75075
Ilene      Dover       TX    75012
Karn E.    Assada      TX    76016
Phil       Errup       TX    75075
Willie     Maykit      TX    75222
 
Now the results are free of duplicates, but they show all 6 developers
who live in Texas, not just those who share a zip code!  What's going
on?  (Note how it alphabetized the first column, too.)
 
OK.  The self-join compares all the zip code values to themselves, so
that each TX developer's zip code automatically matches his or her own
zip code.  There, all the Texans appear in the results.  How do we 
eliminate those developers who only match themselves?  Let's add one
more condition to the WHERE clause:
 
   SELECT DISTINCT Dev1.fname, ;
          Dev1.lname, ;
          Dev1.state, ;
          Dev1.zip    ;
     FROM Develop Dev1,  ;
          Develop Dev2   ;
    WHERE Dev1.state = "TX" ;
      AND Dev1.zip   = Dev2.zip ;
      AND Dev1.fname + Dev1.lname != Dev2.fname + Dev2.lname ;
     INTO CURSOR DevZip
 
Fname      Lname       State Zip
---------- ----------- ----- -------
Ben        Dover       TX    75012
David      Powell      TX    75075
Ilene      Dover       TX    75012
Phil       Errup       TX    75075
 
Now to make it even easier to read, let's add an ORDER BY clause for zip:
 
   SELECT DISTINCT Dev1.fname, ;
          Dev1.lname, ;
          Dev1.state, ;
          Dev1.zip    ;
     FROM Develop Dev1,  ;
          Develop Dev2   ;
    WHERE Dev1.state = "TX" ;
      AND Dev1.zip = Dev2.zip ;
      AND Dev1.fname + Dev1.lname != Dev2.fname + Dev2.lname ;
    ORDER BY 4 ;
     INTO CURSOR DevZip
 
Fname      Lname       State Zip
---------- ----------- ----- -------
Ben        Dover       TX    75012
Ilene      Dover       TX    75012
David      Powell      TX    75075
Phil       Errup       TX    75075
 
That looks better.  In the first, unrestrained example the duplicate
rows for these four developers resulted from each developer's match
with self and with the other qualifying developers.  Eliminating
duplicates could be important to an analysis of the validity of the
results, in cases where you really want to know "How many?" as well
as "Who?".  And this is found in the system variable, _TALLY.
 
Isn't this a beautiful language?  I love it!  Comments welcome.
 
David in Dallas.
--- Maximus/2 3.01
---------------
* Origin: * MacSavvy OS/2 BBS * Dallas, Texas * 972-250-4479 * (1:124/1208)

SOURCE: echomail via exec-pc

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.