GP> Does anyone out there know and understand hashing?
Its been a while since I've had my files & databases class but still
remember some of it. Hashing is basically a way of calculating where
to store a piece of data, and then of course retrieving it. You run
a part of the data (in this case probably the record #), called a key,
thru a hash formula, which then will return a location you can store it
at. When you need to retrieve it your search parameter needs to be a
key to use in the hash formula and then go get it. This method can
result in two pieces of data having the same hash value, so you have to
be able to handle that as well. I've got my class project program
around here someplace could take a peek at it and try to get it
completely back into my head. Since I don't do a lot of databases
doesn't have much use for me. Also prefer a different approach to
searching for data.
GP> Type UserIdx
GP> UserName AS STRING * 61
GP> RecordNumber AS LONG
GP> END TYPE
GP> My total record count 281 records. However, the UserIdx file that was
GP> created with hashing stays at a constant of 4258865 bytes.
He probably just allocated a bunch of space to begin with. Alot of the
file is probably just empty space, or random junk if he didn't
initialize it (clearing all entries) to begin with. This happened with
the term program for using Prodigy long ago. Their software would
allocate this huge buffer file, but it didn't clear it first, so people
would look at it and see source code to their software, configuration
files, etc. and thought Prodigy was stealing data from their hard
discs, when in fact what is was was old data that was still on the hard
drive, cause as you know when you delete a file it doesn't clear the
data just changes the entry in the FAT table. That is one of the
dangers of hashing, as the 1st key might give a hash value of say 10
and the next one is 60000, so you have all this allocated but "empty"
space scattered about from records 1 to 60000.
GP> LONG h;
GP> for (h = OL; *s; s++)
GP> h = (h * 256L + toupper(*s)) % 65521L;
GP> return h;
Thats his hash formula it looks like, but you are missing some data,
like what *s is pointing to (at least I think thats a pointer, I'm a
bit rusty at C), what is OL and what is in toupper(). The % is MOD
function I believe, which is pretty standard for a hash formula, I know
I used it in my project code. Toupper() probably has some various
numbers in it to help in giving a good range yet lack of duplicates
when calculating record numbers based on the key. Alot of this is
coming back to me as I type in this message as I remember doing the
same thing in my project code (all in PowerBASIC btw, which worked
great!).
GP> I hope someone out there can help.. I looked everywhere in my
GP> PowerBasic v3.2 manuals and cannot seem to find anything on this
GP> "Hashing"... I hope this does not mean PowerBasic is limited when it
GP> comes to this... :(
Nope, not a limitation in PowerBasic. Its just a method of storing
data for a database type file. There are others as well, and all would
be just as easy to implement in PB as they would be in any other
language (well I could wrong on this part, since I don't do a lot of
database'ing, but from what I saw in class it looked easy enough to
do). Good luck with your code, and hopefully I made hashing a bit less
foggy for you or at least enough to get a grasp on what this guy was
doing. Eric
--- QM v1.00
---------------
* Origin: Creekside Manor (805) 484-8016 CdCom Support BBS (1:206/2512.0)
|