| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | FMail duplicate detection |
On Wed, 23 Apr 2014, Wilfred van Velzen wrote to mark lewis:
ml> what does FMail use for its duplicate detection, please?
ml> on this system, i see numerous false positive duplicates that have
ml> passed through the fastecho system of my uplink... i'm trying to
ml> track down why FMail would think they are duplicates when FE has
ml> not... the FE system is using its largest dupe database option...
WvV> First, I have never noticed a false positive. There were false
WvV> negatives, when the messages were too old to be in the dupe base.
what determines the age of "too old"?? simply the number of
entries in the database, the age of the post by creation date or the age of
the post by arrival time??
WvV> Regarding dupe detection by fmail, there are "clues" in the doc
WvV> file:
yes but i was asking so that "we" wouldn't have to go digging
through documentation and code that isn't forthcoming with a simple and
straight answer...
WvV> FMAIL.DUP Contains the database with signatures of
WvV> messages used by FMail to detect duplicate
WvV> messages. FMail keeps track of the last 16384
WvV> messages.
wow... understandable to a point... it brings up the question of what the
records consist of to fill 64K with only 16384 posts...
WvV> FMAIL32.DUP The 32-bit version of the duplicate detection
WvV> file. It is capable of keeping track of more
WvV> duplicates than the 16-bit DOS version. (max.
WvV> 9999*1024).
this seems inconsistent with the previous statement... one says "16384
messages" whereas this one seems to say 9999 messages with 1024 bytes
(bits?) per entry... see my above statement about digging through
documentation with no simple and straight forward answers...
WvV> Ignore MSGID
WvV> Normally FMail uses the MSGID of a message (if present)
WvV> for duplicate detection purposes. In some cases, this
WvV> may cause problems when different messages are having
WvV> the same MSGID: one or more of these messages will be
WvV> marked as duplicates although they are not. If you are
WvV> frequently experiencing these problems, try setting this
WvV> switch to 'Yes'.
that would seem to defeat the purpose of MSGID... especially if FMail is
expecting the MSGID to be unique across all message areas... in fact, this
brings up one of the flaws in the MSGID portion of the relevent FTSC
standard document... there is no specification of uniqueness across all
message areas or if the uniqueness is per message area... there are several
well known packages that operate on the "per area" basis which
then causes false positives in other packages... for that matter, there are
some well known packages that maintain duplicate databases on a per area
basis instead of one attempting to cover all message bases...
WvV> Dups recs (x1024) (32-bit mode only, start FSetupX with "/32")
is this true for all supported OSes? this shouldn't, IMHO, be necessary...
the tool should be able to detect which environment it is running in and
use the necessary means/methods/capabilities...
WvV> Number of signatures of messages that are stored on
WvV> disk.
i'm not understanding this since it was separated from the above and
looking like it was just floating...
WvV> So it depends on the version you are using and your settings. In
WvV> the .DUP file a crc32 of some parts of the message (depending on
WvV> your settings) is stored. If you want to know more about the
WvV> techincal details of that, look in the source:
thanks... but i asked so that
1. non-coders would have a simple straight forward answer
2. myself and others would not have to try to wade through alien code
3. everyone would benefit from an easy concise statement
[eg]
FMail takes a CRC16 and CRC32 of the binary message header plus
the first 60 bytes of the message body AS WELL AS a CRC16 and
CRC32 of the whole message body after the binary header AS WELL
AS a CRC16 and a CRC32 of the last 60 bytes of the message plus
all the SEENBY and PATH lines...
for 16bit systems, we store 16384 records of the above meaning
that only 16384 messages can be dupe checked... systems with more
messages may see duplicates.
for 32bit systems, we store 32768 records of the above meaning
that only 32768 messages can be dupe checkes... systems with more
messages may see duplicates
NOTE: FMail uses one duplicate database for ALL message areas. this
means that some messages will be detected as duplicates even if
they are in another message area. this can happen due to the method
used by some software when they post carbon copies or forwarded
copies of messages.
[/eg]
one catch to the above is when the CRCs are calculated... if they are
calculated after the AREA line has been removed during the toss into the
local base, that eliminates a valuable piece of data that can prevent false
positives... especially those across message areas...
then there's the question of does the duplicate detection have any effect
on the messages being passed on to other systems... depending on how things
are done in the process flow, it may be desirable to pass all messages on
to all other systems and let them detect what they believe are
duplicates... especially if they have a larger duplicate database
capability and only 16384 messages are handled across all areas...
with all of that said, i originally asked and hoped to get a simple and
easy to understand response so that none of the above would need to be
written and no one other than the developer would have to go digging into
the code to try to figure out what is really going on...
)\/(ark
One of the great tragedies of life is the murder of a beautiful theory by a
gang of brutal facts. --Benjamin Franklin
--- FMail/Win32 1.60
* Origin: (1:3634/12.71)SEEN-BY: 3/0 633/0 267 280 281 402 640/384 712/0 848 @PATH: 3634/12 123/500 261/38 712/848 633/280 267 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.