TIP: Click on subject to list as thread! ANSI
echo: public_domain
to: Kieran Haughey
from: Rod Speed
date: 1995-10-10 06:49:00
subject: Dupe Checking

RS> I think one crucial thing is to do a rigorous check that a
RS> particular pair really are a dupe, once the quick look flags
RS> a candidate pair, and if there is any doubt, DONT deleted the
RS> dupe, coz an occasional mistake like that which keeps a dupe
RS> is FAR better than deleting what isnt actually a dupe at all.

KH> That's a good idea, I'm wondering if creating a CRC of the messages
KH> would actually make it easier and quicker to compare 

That gets a tad tricky since you obviously need to only do the CRC on the
part of the message that doesnt change. The most part that might do, but
still qualify as a dupe, is if the PATH and SEENBYs are the only difference
between a pair. OTOH its pretty simple to exclude those from the CRC.

KH> The only idea I have come up with it checking the MSGID: kludge

RS> Its the best candidate, but has some real problems. Its still nothing
RS> like universally used, and there is some potential for accidental reuse
RS> of a particular MSGID too, mostly when changing message creation system.

KH> Also the problem is, how do you know that the MSGID will be unique,
KH> I mean another program could just as easily create the same msgid
KH> number as say msged so that sorta chucks that out the window...

Well, the theory is that it cant, thats part of the spec of the MSGID,
that it shouldnt use the same MSGID again for 3 years. OTOH if you apply
that more thorough comparison of the candidate dupe pair if you do get
a pair with the same MSGID, that would fix that residual risk.

And say just dont dupe check messages which dont have any MSGID.
That would then be a hell of a lot better than no dupe checking at all.

KH> and if it's equal to another messages
KH> MSGID: kludge, then it will not process it..

RS> Thats fine if you ALSO have a check on the other fields on a pair with
RS> the same MSGID. That doesnt need to be that complex in this particular
RS> case, if all the main header fields also match too once you have a pair
RS> with the same MSGID, its pretty safe to assume its a real dupe.

KH> Possibly, but as I said before, how do you know that it's unique...

You dont need to if you only use the matching MSGID as a situation
where you THEN compare the CRC of the pair of messages with the
same MSGID to ensure that they really are the same message.

KH> although this may have to be put in a array or something
KH> like that  and memory gets to be a bit
KH> of a problem for that when your say under DOS and tossing
KH> into a base of about what.. 20000 messages plus..

RS> True, its got some awkwiditys like that. And unfortunately dupe
RS> checking is often most useful on the deeper message bases too.
RS> The most obvious approach with the primitive DOS limitations is
RS> to optimise the approach so that a decent sized cache will help.

KH> Well I have noticed that on a 386DX16 with 2 meg memory,
KH> a disk cache speeds up tobruk almost 2 and a half times.. :)..
KH> I know, how about adding it's own internal caching system  :)

Well, that sometimes makes sense when in can be more intelligent about
what to cache than a gp cache can. Thats the way a decent database does it.

--- PQWK202
* Origin: afswlw rjfilepwq (3:711/934.2)
SEEN-BY: 690/718 711/809 934

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.