>>> Continued from previous message
Hashing, on the other hand, can increment the count of words
already in the list, or add new words to the list, without
the overhead of sorting, sequential searches, or push-type
insertion. Also, remember that entry deletion is a problem with
hashing. Word distribution counts NEVER require entries to be
struck, and so are well-suited to hashing systems.
A good rule of thumb to determine which method may be best for a
given problem is to cosider the points on this table:
T2.0 List Management System Ratings
List Type
SEQUENTIAL BINARY HASHED
=====================================================
small list 1 3 2
medium list 3 1 2
large list 3 2 1
huge list 3 2 1
Insertion 2 3 1
Modification 3 2 1
Deletion 1 2 3
Browsing 2 1 3
(Systems are ranked first, second, or third)
=======>8 TABLE 2.0 ENDS HERE 8<=========
Using this table, we can see that the best method for short
lists that require frequent deletions might be the sequential
list. The best for huge lists that require insertions,
modifications, but not deletions (such as a nodelist index) is
probably a hashed list. A hashed list, however, will not do
much for you if you regularly want to access the next item,
first item in the list, or last item, such as in a list browsing
system. Hashed lists have no logical beginning or end, and
for this reason, there is no such thing as a "first item" or
"next item" in a hashed list. Each entry is a single entity,
retrievable only as a single entity, with no relation to any
other entry in the hashed list. This excludes applications
that require browsing, as I have mentioned, but is perfect
for symbol tables, dictionaries, and the like.
Q3.2 This is all pretty new to me. Give me a practical review.
A3.2 Okay. In the hashed list there is no sense of sequence in
the classic sense of the concept. Items are put into buckets
based upon the type of calculation I have already discussed, and
if the bucket is already in use, a new bucket is found according
to a set system. Therefore, two similar items in a hashed table
may actually have a physical distance of 500 entries between them.
A practical example:
We have a hash table 7 buckets big, and you want to store three
entries in it, using hashing. For simplicity, let's just store the
characters A, B, and C, using their ASCII values as keys. Their
buckets would be:
Item Formula Bucket
=========================
A 65 MOD 7 2
B 66 MOD 7 3
C 67 MOD 7 4
No collisions have occured here, since this is a simple case.
Now, let us add just one more item: H. The first bucket that
H will request is 72 MOD 2, or 2, which is being used by A.
This is collision. Now, we must find an empty bucket, and so,
we apply a common method to the old bucket: we subtract an
offset from 2. The offset is calulated thus:
Offset = TableSize - Bucket, or
Offset = 7 -2
Offset = 5
Okay, now, whenever a collision occurs, we recalculate a position
using this formula:
NewPos = OldPos - Offset
NewPos = 2 - 5
NewPos = -3
In cases where NewPos is less than 0, we then add the table size
to the interim result:
NewPos = NewPos + TableSize, or
NewPos = -3 + 7
NewPos = 4
We see that this new bucket, 4, is being used by C, and so we
have to recalculate the bucket one more time:
NewPos = OldPos - Offset, or
NewPos = 4 - 5
NewPos = -1
NewPos <0 so
NewPos = NewPos + TableSize, or
NewPos = -1 + 7
NewPos = 6
We see that 6 is an empty bucket, and therefore, our table now
looks something like this:
Entry Bucket
==============
1 (empty bucket)
A 2 (no collisions)
B 3 (no collisions)
C 4 (no collisions)
5 (empty bucket)
H 6 (arrived at after two collisions)
7 (empty bucket)
Now, remember from past explanations that searches are conducted
by comparing each entry to the key until an empty bucket is reached.
Therefore, to find A in the table, we calculate a bucket of
65 MOD 7, or 2. We look in bucket 2, and see that our key of A is
the same as the table entry A. We have therefore found our entry in
one look! Now, let's look for I. That's a bit different, since
it isn't in the list. How many looks are needed to tell us that
it isn't? Well 73 MOD 7 is 3, and we see immediately that bucket
3 is a B, not an I. We recalculate the next bucket, and get:
Offset = 4
NewPos = (3 - 4) or -1
Less than 0, so
NewPos = 6
Bucket 6 is occupied by an H, and so we calculate the next bucket:
Offset = 4
NewPos = (6-4) = 2
Bucket 2 is occupied by an A, and so:
NewPos = (2 - 4)
NewPos = -2 + 7 = 5
Finally, bucket 5 is empty. Therefore, since we've arrived at
an empty bucket BEFORE arriving at I, we can say that I is not
in the list. How many steps required? Four. Quite a bit of
overhead on a short list of 7 entries, but consider a list of
100,000 entries! Four searches to find an item is fast!
>>> Continued to next message
* SLMR 2.1a * MAXLIB For PB v1.2 - Access arrays and files in EMS/XMS!
--- WILDMAIL!/WC v4.12
---------------
* Origin: Com-Dat BBS - Hillsboro, OR. HST DS (1:105/314.0)
|