| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | Dupe Checking |
RS> I think one crucial thing is to do a rigorous check that a RS> particular pair really are a dupe, once the quick look flags RS> a candidate pair, and if there is any doubt, DONT deleted the RS> dupe, coz an occasional mistake like that which keeps a dupe RS> is FAR better than deleting what isnt actually a dupe at all. KH> That's a good idea, I'm wondering if creating a CRC of the messages KH> would actually make it easier and quicker to compare That gets a tad tricky since you obviously need to only do the CRC on the part of the message that doesnt change. The most part that might do, but still qualify as a dupe, is if the PATH and SEENBYs are the only difference between a pair. OTOH its pretty simple to exclude those from the CRC. KH> The only idea I have come up with it checking the MSGID: kludge RS> Its the best candidate, but has some real problems. Its still nothing RS> like universally used, and there is some potential for accidental reuse RS> of a particular MSGID too, mostly when changing message creation system. KH> Also the problem is, how do you know that the MSGID will be unique, KH> I mean another program could just as easily create the same msgid KH> number as say msged so that sorta chucks that out the window... Well, the theory is that it cant, thats part of the spec of the MSGID, that it shouldnt use the same MSGID again for 3 years. OTOH if you apply that more thorough comparison of the candidate dupe pair if you do get a pair with the same MSGID, that would fix that residual risk. And say just dont dupe check messages which dont have any MSGID. That would then be a hell of a lot better than no dupe checking at all. KH> and if it's equal to another messages KH> MSGID: kludge, then it will not process it.. RS> Thats fine if you ALSO have a check on the other fields on a pair with RS> the same MSGID. That doesnt need to be that complex in this particular RS> case, if all the main header fields also match too once you have a pair RS> with the same MSGID, its pretty safe to assume its a real dupe. KH> Possibly, but as I said before, how do you know that it's unique... You dont need to if you only use the matching MSGID as a situation where you THEN compare the CRC of the pair of messages with the same MSGID to ensure that they really are the same message. KH> although this may have to be put in a array or something KH> like that and memory gets to be a bit KH> of a problem for that when your say under DOS and tossing KH> into a base of about what.. 20000 messages plus.. RS> True, its got some awkwiditys like that. And unfortunately dupe RS> checking is often most useful on the deeper message bases too. RS> The most obvious approach with the primitive DOS limitations is RS> to optimise the approach so that a decent sized cache will help. KH> Well I have noticed that on a 386DX16 with 2 meg memory, KH> a disk cache speeds up tobruk almost 2 and a half times.. :).. KH> I know, how about adding it's own internal caching system :) Well, that sometimes makes sense when in can be more intelligent about what to cache than a gp cache can. Thats the way a decent database does it. --- PQWK202* Origin: afswlw rjfilepwq (3:711/934.2) SEEN-BY: 690/718 711/809 934 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.