TIP: Click on subject to list as thread! ANSI
echo: power_bas
to: GARY PRICE
from: BRIAN MCLAUGHLIN
date: 1995-10-26 08:27:00
subject: Hashing in PowerBASIC 2/4

>>> Continued from previous message
        Hashing, on the other hand, can increment the count of words
        already in the list, or add new words to the list, without
        the overhead of sorting, sequential searches, or push-type
        insertion.  Also, remember that entry deletion is a problem with
        hashing.  Word distribution counts NEVER require entries to be
        struck, and so are well-suited to hashing systems.
        A good rule of thumb to determine which method may be best for a
        given problem is to cosider the points on this table:
T2.0    List Management System Ratings
                                      List  Type
                        SEQUENTIAL      BINARY          HASHED
                =====================================================
small list                  1              3              2
medium list                 3              1              2
large list                  3              2              1
huge list                   3              2              1
Insertion                   2              3              1
Modification                3              2              1
Deletion                    1              2              3
Browsing                    2              1              3
                     (Systems are ranked first, second, or third)
=======>8 TABLE 2.0 ENDS HERE 8<=========
        Using this table, we can see that the best method for short
        lists that require frequent deletions might be the sequential
        list.  The best for huge lists that require insertions,
        modifications, but not deletions (such as a nodelist index) is
        probably a hashed list.  A hashed list, however, will not do
        much for you if you regularly want to access the next item,
        first item in the list, or last item, such as in a list browsing
        system.  Hashed lists have no logical beginning or end, and
        for this reason, there is no such thing as a "first item" or
        "next item" in a hashed list.  Each entry is a single entity,
        retrievable only as a single entity, with no relation to any
        other entry in the hashed list.  This excludes applications
        that require browsing, as I have mentioned, but is perfect
        for symbol tables, dictionaries, and the like.
Q3.2    This is all pretty new to me.  Give me a practical review.
A3.2    Okay.  In the hashed list there is no sense of sequence in
        the classic sense of the concept.  Items are put into buckets
        based upon the type of calculation I have already discussed, and
        if the bucket is already in use, a new bucket is found according
        to a set system. Therefore, two similar items in a hashed table
        may actually have a physical distance of 500 entries between them.
        A practical example:
        We have a hash table 7 buckets big, and you want to store three
        entries in it, using hashing.  For simplicity, let's just store the
        characters A, B, and C, using their ASCII values as keys.  Their
        buckets would be:
        Item   Formula    Bucket
        =========================
          A    65 MOD 7     2
          B    66 MOD 7     3
          C    67 MOD 7     4
        No collisions have occured here, since this is a simple case.
        Now, let us add just one more item: H.  The first bucket that
        H will request is 72 MOD 2, or 2, which is being used by A.
        This is collision.  Now, we must find an empty bucket, and so,
        we apply a common method to the old bucket: we subtract an
        offset from 2.  The offset is calulated thus:
                Offset = TableSize - Bucket, or
                Offset = 7 -2
                Offset = 5
        Okay, now, whenever a collision occurs, we recalculate a position
        using this formula:
                NewPos = OldPos - Offset
                NewPos = 2 - 5
                NewPos = -3
        In cases where NewPos is less than 0, we then add the table size
        to the interim result:
                NewPos = NewPos + TableSize, or
                NewPos = -3 + 7
                NewPos = 4
        We see that this new bucket, 4, is being used by C, and so we
        have to recalculate the bucket one more time:
                NewPos = OldPos - Offset, or
                NewPos = 4 - 5
                NewPos = -1
                NewPos <0 so
                NewPos = NewPos + TableSize, or
                NewPos = -1 + 7
                NewPos = 6
        We see that 6 is an empty bucket, and therefore, our table now
        looks something like this:
        Entry   Bucket
        ==============
                  1 (empty bucket)
         A        2 (no collisions)
         B        3 (no collisions)
         C        4 (no collisions)
                  5 (empty bucket)
         H        6 (arrived at after two collisions)
                  7 (empty bucket)
        Now, remember from past explanations that searches are conducted
        by comparing each entry to the key until an empty bucket is reached.
        Therefore, to find A in the table, we calculate a bucket of
        65 MOD 7, or 2.  We look in bucket 2, and see that our key of A is
        the same as the table entry A.  We have therefore found our entry in
        one look!  Now, let's look for I.  That's a bit different, since
        it isn't in the list.  How many looks are needed to tell us that
        it isn't?  Well 73 MOD 7 is 3, and we see immediately that bucket
        3 is a B, not an I.  We recalculate the next bucket, and get:
                Offset = 4
                NewPos = (3 - 4) or -1
                Less than 0, so
                NewPos = 6
        Bucket 6 is occupied by an H, and so we calculate the next bucket:
                Offset = 4
                NewPos = (6-4) = 2
        Bucket 2 is occupied by an A, and so:
                NewPos = (2 - 4)
                NewPos = -2 + 7 = 5
        Finally, bucket 5 is empty.  Therefore, since we've arrived at
        an empty bucket BEFORE arriving at I, we can say that I is not
        in the list.  How many steps required?  Four.  Quite a bit of
        overhead on a short list of 7 entries, but consider a list of
        100,000 entries!  Four searches to find an item is fast!
>>> Continued to next message
 * SLMR 2.1a * MAXLIB For PB v1.2 - Access arrays and files in EMS/XMS!
--- WILDMAIL!/WC v4.12 
---------------
* Origin: Com-Dat BBS - Hillsboro, OR. HST DS (1:105/314.0)

SOURCE: echomail via exec-pc

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.