Hey mark!
Feb 20 16:42 05, mark lewis wrote to Maurice Kinal:
ml> MSGID isn't the magic bullet that some seem to want to think it is...
ml> some of your comments appear to be saying that it should/could be and
ml> that it isn't and thus should be thrown in the bitbucket...
Yes and no. What I am trying to say, or what I think I am saying, is that
without rhyme or reason ALL that is good for won't make any difference
whether it is thrown into the bitbucket or not. Without any meaningful
logic it is a complete waste of bytes and processing it causes. With logic
it has potential as a viable accounting flag/tag/whatever. I still have
doubts about it's dupechecking abilities but at least it has some
potential. Currently I have doubts about any real usefulness to man or
machine.
ml> while i do /tend/ to agree, i also tend more to not agree...
Sounds reasonable.
ml> even then, you then have the problem of crossposted messages... is it
ml> a dupe because it is exactly the same message in more than one area?
ml> i don't think so...
Nor do I. As long as it is accountable in the area it shows up in then I
can't see a problem with it, even if it shows up in other areas. However,
having said that, I'd think there may be a better way to archive crossposted messages where one carries more then one
area where that message is "posted" to. A single message could
fly more then one area tag. Thus some redundancy could effectively be
eliminated. No?
ml> detecting duplicates in fidonet is a tricky science,
I would agree with that assessment.
ml> messaging... wildcat, pcboard and wwiv systems are the first three
ml> that come to mind as having shoehorned retrofits for participation in
ml> fidonet... quite simply, their message bases were not designed with
ml> fidonet in mind... actually, not just fidonet but more without any
ml> sort of thought to control lines within messages...
Right. Having a trimmed down archiving system where all stored messages
only contain what is absolutely needed to successfully be deemed a
"message" - say "To", "From",
"Date" - and then tack on whatever else is required depending on
the target, would greatly reduce the amount of information any archived
base or area needs to know. For instance a dynamic cgi script could take
this information and "convert" it to html display to the end user
without affecting the archive in any meaningful way, and that exact same
archive could be employed to construct outbound Fido compliant pkts.
ml> it is long past the time when this stuff can truely be fixed and
ml> enforced...
Probably but that doesn't mean we can't discuss, and/or employ, any of this
"stuff" to our advantage. Chances are by doing that we may all
find ourselves complying out of choice as opposed to enforcement ... or so
the theory goes.
ml> all we can do now is to play the game and hope for the
ml> best...
That is one way.
ml> that takes us to the question of how to build a dataset of messages
ml> and what to use as the duplicate trigger...
Right.
ml> things are done in binary in fidonet because of limited storage space
ml> as well as for speed of processing, we have to ask what method would
ml> ultimately be the best for quick processing, small storage, and
ml> generating truely unique IDs for the local duplicate detection
ml> system?
That is a toughy for sure. Again I would think a standard method of
generation of MSGID would be of great assistance to all. It isn't
foolproof (is anything?) but it would help.
ml> i can see possibly a two fold method involving recording the actual
ml> header data as well as running it thru md5 or some such and recording
ml> the MSGID if it exists...
Possibly. It sounds like it has potential.
ml> speed... how much time are you willing to spend rummaging thru a
ml> duplicate dataset looking for a match before deciding if a message is
ml> a duplicate or not?
Heh, heh. It depends on how big a problem dupes really are. Not many REAL
dupes and then I would say zero "rummaging", but if I were Rusty
and seeing hundreds of REAL dupes then I'd really wish my uplink was doing
better quality control. But then that of course brings up the question
whether or not the uplink isn't filtering out messages that aren't really
dupes but instead MSGID dupes. I've seen those and have seriously wondered
if the few I do manage to see aren't representative of a far greater and
unseen problem regarding the whole MSGID situation as it stands today.
ml> considering your high desire for speed, i can see
ml> small datasets (one per message area al la squish?) to ease the
ml> search time...
Possibly. I have been pondering what I wish to do locally for myself all
the way around, not just Fido.
ml> interesting problem, this is... i'm already visualising multiple dupe
ml> dataset files based on the AREA line, locally carried areas
ml> notwithstanding due to the processing of passthru areas, or one large
ml> or even multiple large datafiles containing AREA grouped datasets of
ml> header and MSGID data...
Interesting to ponder.
Life is good,
Maurice
--- Msged/LNX 6.1.2
* Origin: Coffin Point - Ladysmith, BC Canada (1:153/401.1)
SEEN-BY: 633/267 270
@PATH: 153/401 307 140/1 106/2000 633/267
|