TIP: Click on subject to list as thread! ANSI
echo: public_domain
to: Rod Speed
from: Kieran Haughey
date: 1995-10-08 08:04:30
subject: Dupe Checking

On 05 Oct 95 10:42, Rod Speed wrote to Kieran Haughey:

Hi Rod,

KH>> I was just wondering, I have been fiddling with
KH>> Tobruk 0.31 
KH>> and started thinking about Dupe checking, and I was wondering
KH>> what the best way of checking for Dupe's would be?..

RS> Personally, I think most current dupe checking is FAR
RS> too gung ho, and that results in false dupe detection,
RS> and dropping of messages which aren actually dupes at all.

RS> I think one crucial thing is to do a rigorous check that a particular pair
RS> really are a dupe, once the quick look flags a candidate pair, and if there
RS> is any doubt, DONT deleted the dupe, coz an occasional mistake like that 
RS> which
RS> keeps a dupe is FAR better than deleting what isnt actually a dupe at all.

That's a good idea, I'm wondering if creating a CRC of the messages would
actually make it easier and quicker to compare 

KH>> The only idea I have come up with it checking the MSGID: kludge

RS> Its the best candidate, but has some real problems. Its still nothing
RS> like universally used, and there is some potential for accidental reuse
RS> of a particular MSGID too, mostly when changing message creation system.

Also the problem is, how do you know that the MSGID will be unique, I mean
another program could just as easily create the same msgid number as say
msged so that sorta chucks that out the window...

KH>> and if it's equal to another messages MSGID: kludge,
KH>> then it will not process it..

RS> Thats fine if you ALSO have a check on the other fields on a pair with
RS> the same MSGID. That doesnt need to be that complex in this particular
RS> case, if all the main header fields also match too once you have a pair
RS> with the same MSGID, its pretty safe to assume its a real dupe.

Possibly, but as I said before, how do you know that it's unique...

KH>> although this may have to be put in a array or something
KH>> like that  and memory gets to be a bit
KH>> of a problem for that when your say under DOS and tossing
KH>> into a base of about what.. 20000 messages plus..

RS> True, its got some awkwiditys like that. And unfortunately dupe
RS> checking is often most useful on the deeper message bases too.
RS> The most obvious approach with the primitive DOS limitations is
RS> to optimise the approach so that a decent sized cache will help.

Well I have noticed that on a 386DX16 with 2 meg memory, a disk cache
speeds up tobruk almost 2 and a half times.. :).. I know, how about adding
it's own internal caching system  :)
ÿ
Cheers
Kieran

3:711/413.17
@EOT:

--- MsgedSQ 3.25 alpha 14
* Origin: -=> Kiza's Pointedly Pointless Point <=- (3:711/413.17)
SEEN-BY: 50/99 640/230 690/718 711/401 410 413 420 423 430 807 808 809 934
SEEN-BY: 713/888 800/1 7877/2809
@PATH: 711/413 808 809 934

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.