KK> That makes things a bit more fun.
KK> So then you are not only keeping duplicates, but keeping a
KK> record of how many collisions exist.
Er, no, I don't think so. I'm simply tabling, sorting and printing a "list"
of how many times each unique char array occurs in the source data.
Problems so far:
1) With a linked list, the source data is big enough that I lose an awful lot
of overhead searching the list to see if a given char array has been seen
t;
2) With a hash table, I have the problem of sorting in order to rank the char
arrays in descending order of frequency. I *could* change the hash table code
slightly so as to not chain a collision off the existing table bucket, and
therefore make the collision directly addressable within the table, but I
start to get concerned about table size.
I'm starting to wonder if I can't get the best of both worlds by building a
hash table (chaining collisions) and then using the hash table to build a
linked list which I could then sort. I wouldn't have to search that linked
list at all, because I'd have taken care of the "unique char array" issue via
the hash table.
I suppose I'm going to have to start doing some timings.
--- GoldED 2.50 UNREG
---------------
* Origin: Wanna NEC? (1:163/222)
|