TIP: Click on subject to list as thread! ANSI
echo: fmail_help
to: Wilfred van Velzen
from: mark lewis
date: 2014-04-23 21:18:56
subject: FMail duplicate detection

On Wed, 23 Apr 2014, Wilfred van Velzen wrote to mark lewis:

 ml> what does FMail use for its duplicate detection, please?

 ml> on this system, i see numerous false positive duplicates that have 
 ml> passed through the fastecho system of my uplink... i'm trying to 
 ml> track down why FMail would think they are duplicates when FE has 
 ml> not... the FE system is using its largest dupe database option...

 WvV> First, I have never noticed a false positive. There were false 
 WvV> negatives, when the messages were too old to be in the dupe base.

what determines the age of "too old"?? simply the number of
entries in the database, the age of the post by creation date or the age of
the post by arrival time??

 WvV> Regarding dupe detection by fmail, there are "clues" in the doc
 WvV> file: 

yes but i was asking so that "we" wouldn't have to go digging
through documentation and code that isn't forthcoming with a simple and
straight answer...

 WvV> FMAIL.DUP      Contains  the database  with signatures  of
 WvV> messages                used by  FMail to  detect duplicate
 WvV> messages.  FMail                keeps track of the last 16384
 WvV> messages.

wow... understandable to a point... it brings up the question of what the
records consist of to fill 64K with only 16384 posts...

 WvV> FMAIL32.DUP    The 32-bit version of the duplicate detection 
 WvV> file.                It is  capable of  keeping track of  more
 WvV> duplicates                than the 16-bit DOS version. (max.
 WvV> 9999*1024). 

this seems inconsistent with the previous statement... one says "16384
messages" whereas this one seems to say 9999 messages with 1024 bytes
(bits?) per entry... see my above statement about digging through
documentation with no simple and straight forward answers...

 WvV> Ignore MSGID

 WvV>           Normally FMail uses the MSGID of a message (if present) 
 WvV>          for duplicate detection purposes. In some cases, this    
 WvV>       may cause problems when different messages are having       
 WvV>    the same MSGID: one or more of these messages will be          
 WvV> marked as duplicates although they are not. If you are          
 WvV> frequently experiencing these problems, try setting this          
 WvV> switch to 'Yes'.

that would seem to defeat the purpose of MSGID... especially if FMail is
expecting the MSGID to be unique across all message areas... in fact, this
brings up one of the flaws in the MSGID portion of the relevent FTSC
standard document... there is no specification of uniqueness across all
message areas or if the uniqueness is per message area... there are several
well known packages that operate on the "per area" basis which
then causes false positives in other packages... for that matter, there are
some well known packages that maintain duplicate databases on a per area
basis instead of one attempting to cover all message bases...

 WvV> Dups recs (x1024) (32-bit mode only, start FSetupX with "/32") 

is this true for all supported OSes? this shouldn't, IMHO, be necessary...
the tool should be able to detect which environment it is running in and
use the necessary means/methods/capabilities...

 WvV>           Number of signatures of messages that are stored on     
 WvV>      disk.

i'm not understanding this since it was separated from the above and
looking like it was just floating...

 WvV> So it depends on the version you are using and your settings. In
 WvV> the .DUP file a crc32 of some parts of the message (depending on
 WvV> your settings) is stored. If you want to know more about the
 WvV> techincal details of that, look in the source:

thanks... but i asked so that

1. non-coders would have a simple straight forward answer

2. myself and others would not have to try to wade through alien code

3. everyone would benefit from an easy concise statement

[eg]
    FMail takes a CRC16 and CRC32 of the binary message header plus 
    the first 60 bytes of the message body AS WELL AS a CRC16 and 
    CRC32 of the whole message body after the binary header AS WELL 
    AS a CRC16 and a CRC32 of the last 60 bytes of the message plus 
    all the SEENBY and PATH lines...

    for 16bit systems, we store 16384 records of the above meaning 
    that only 16384 messages can be dupe checked... systems with more 
    messages may see duplicates.

    for 32bit systems, we store 32768 records of the above meaning 
    that only 32768 messages can be dupe checkes... systems with more 
    messages may see duplicates

    NOTE: FMail uses one duplicate database for ALL message areas. this 
    means that some messages will be detected as duplicates even if 
    they are in another message area. this can happen due to the method
    used by some software when they post carbon copies or forwarded
    copies of messages.
[/eg]

one catch to the above is when the CRCs are calculated... if they are
calculated after the AREA line has been removed during the toss into the
local base, that eliminates a valuable piece of data that can prevent false
positives... especially those across message areas...

then there's the question of does the duplicate detection have any effect
on the messages being passed on to other systems... depending on how things
are done in the process flow, it may be desirable to pass all messages on
to all other systems and let them detect what they believe are
duplicates... especially if they have a larger duplicate database
capability and only 16384 messages are handled across all areas...


with all of that said, i originally asked and hoped to get a simple and
easy to understand response so that none of the above would need to be
written and no one other than the developer would have to go digging into
the code to try to figure out what is really going on...

)\/(ark

One of the great tragedies of life is the murder of a beautiful theory by a
gang of brutal facts. --Benjamin Franklin

--- FMail/Win32 1.60
* Origin: (1:3634/12.71)
SEEN-BY: 3/0 633/0 267 280 281 402 640/384 712/0 848
@PATH: 3634/12 123/500 261/38 712/848 633/280 267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.