| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | talking to myself |
RT>> with his "gate" a while back, that I know of that is.
MK> That was one situation I as thinking of. Got under everyone's
MK> radar despite all the sysops in that echo noticing it. So
MK> much for MSGID eh? Everyone of those messages was a true dupe
MK> that not one tosser caught.
MSGID isn't the magic bullet that some seem to want to think it is... some
of your comments appear to be saying that it should/could be and that it
isn't and thus should be thrown in the bitbucket...
while i do /tend/ to agree, i also tend more to not agree... detecting
dupes in fidonet is not magic nor is it tied to one thing... for various
reasons, fidonet can't even use md5 checksums on the message body to
determine if a message is a duplicate... message headers, existing control
lines, origin and seenby and path lines can all be stripped or otherwise
modified or corrupted... the only real way to tell a dupe would be by
enforcing some sort of message body formatting and md5'ing the message body
much the same way that PGP can be used to sign a message to show if it has
been modified since sending...
ie: when the body is generated and the message saved,
md5 the body and store that in a control line that
travels with the message. i still don't recall if
there are message processors out there that alter
the message body (ie: by replacing CRLF with LF)
even then, you then have the problem of crossposted messages... is it a
dupe because it is exactly the same message in more than one area? i don't
think so...
detecting duplicates in fidonet is a tricky science, to say the least...
checking the header info and message control lines (including the origin
line) is about the only way... still this can fail due to the way some
systems have been retrofitted for fidonet messaging... wildcat, pcboard and
wwiv systems are the first three that come to mind as having shoehorned
retrofits for participation in fidonet... quite simply, their message bases
were not designed with fidonet in mind... actually, not just fidonet but
more without any sort of thought to control lines within messages...
it is long past the time when this stuff can truely be fixed and
enforced... all we can do now is to play the game and hope for the best...
that said, there are things that can be done to try to ensure that messages
generated by your software do make it past the various and sundry dupe
checking schemes out there... one of the first and easiest is to implement
MSGID and ensure that it is the first control line after the message
header... this may or may not help with very braindead dupe checking that
looks to the header only with no regard for the message body at all as that
system was developed with a myopic view of users creating messages and not
with the thought that an automated process like text file posting or
offline mail doors may post more than one message per second... most of the
software that did that braindead method of dupe checking have been tossed
or upgraded for something that does the same but also takes into
consideration the first 20+ bytes of the message body...
there is still the problem of dupe checkers that use a CRC16 or CRC32
method of storing an "ID" of a message based on the header and
20+ bytes of the message... this is due to the simple fact that there are a
limited number of CRC16 and CRC32 results and that it is fairly trivial to
find more than one dataset that generates the same CRC16 or CRC32 value...
that takes us to the question of how to build a dataset of messages and
what to use as the duplicate trigger... remembering that many things are
done in binary in fidonet because of limited storage space as well as for
speed of processing, we have to ask what method would ultimately be the
best for quick processing, small storage, and generating truely unique IDs
for the local duplicate detection system?
the first thing i can think of is to record the header info and the entire
MSGID... the question is, then, how to record the header info? would one
use the actual fields or would one run the header fields thru a formula
like md5 or something else??
i can see possibly a two fold method involving recording the actual header
data as well as running it thru md5 or some such and recording the MSGID if
it exists...
that would likely be the utmost method but it wouldn't be the smallest data
record per message... there's also the question of speed... how much time
are you willing to spend rummaging thru a duplicate dataset looking for a
match before deciding if a message is a duplicate or not? considering your
high desire for speed, i can see small datasets (one per message area al la
squish?) to ease the search time...
interesting problem, this is... i'm already visualising multiple dupe
dataset files based on the AREA line, locally carried areas notwithstanding
due to the processing of passthru areas, or one large or even multiple
large datafiles containing AREA grouped datasets of header and MSGID
data...
)\/(ark
* Origin: (1:3634/12)SEEN-BY: 633/267 270 @PATH: 3634/12 106/2000 633/267 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.