TIP: Click on subject to list as thread! ANSI
echo: dbridge
to: mark lewis
from: Rob Swindell
date: 2018-06-20 11:44:26
subject: Dupeloops

Re: Dupeloops
  By: mark lewis to Rob Swindell on Wed Jun 20 2018 08:08 am

 >
 >  On 2018 Jun 19 22:43:24, you wrote to me:
 >
 >  >> AFAIK, seenbys and paths are not included in most dupe detection
 >  >> schemes... other non-changing control lines are fine to be
included...
 >  >> one of the problems comes when some system sort those
control lines on
 >  >> messages they are passing along... we don't see so much of
that like we
 >  >> did at one time ;)
 >
 >  RS> So some metadata is included in the data that is hashed for dupe
 >  RS> detection and some is not?
 >
 > yes...
 >
 >  RS> Are you sure about that?
 >
 > yes... in fact, and i don't recall who pointed this out to me back in the
 > '90s,
 > dbridge does exactly this in a manner of speaking... it takes the whole
 > message
 > header plus X bytes immediately following the message header and uses all of
 > that as at least part of the checksum calculation... this was pointed out to
 > me
 > when i was working on my posting tool and was adding MSGID support to it...
 >
 > i was using a library and just letting it do its thing... some of my test
 > posts
 > were reported as dupes when they clearly weren't... IIRC, they were detected
 > as
 > dupes because they were posted within the same second... it turned out that
 > my  MSGID was somewhere in the middle of the control lines at the beginning
 > of the  message body and only my dbridge using testers were seeing this...
 > someone  pointed out this thing about dbridge also using X bytes from the
 > beginning of  the message body in addition to the message header so i moved
 > my posting tool's
 > MSGID to the top of the list and no more dupes were detected by those
 > dbridge  systems...
 >
 > i don't know what other systems do... there's only a very few that provide
 > this
 > information... SBBS is one of them... when i was testing Mystic, there was
 > some
 > discussion about dupe detection as james worked to try to figure out the
 > best  method he liked... i have used fastecho here for decades but i don't
 > know what  data it uses for its checksums... i do know it uses two
 > checksums, though... i  know this because i was being nosy one day and
 > looking at FE's dupe database  file (one for all message areas) with a hex
 > viewer and noticed that groups of  bytes were repeated all throughout the
 > file... i asked about this and was told  i found a bug... basically, FE has
 > two checksums that it uses for each message  and both are supposed to be
 > stored in the database... what i found was that  only one was being used and
 > written to both fields... toby fixed that problem  right quick... i just
 > don't know what data is used to calculate them...
 >
 > back in the day, dupe detection formulas were not really shared around...
 > maybe
 > a couple of developers talking amongst themselves would tell each other what
 > they were doing but this information was not published where everyone could
 > find it... it was more or less black majik to a point...

To complete the discussion, Synchronet (smblib) actually uses multiple methods
of body text dupe detection:

1. A "legacy" CRC-32 hash of the body text, excluding any
metadata, like FTN
   control lines and excluding any trailing white-space or control-characters
2. A tuple of hashes (MD5 digest, CRC-32, and CRC-16) and length (char count)
   of the body text excluding any metadata and *all* white-space characters

These, in addition to duplicate Internet (RFC-822) compliant Message-ID and
FTN-compliant Message-ID checks.

No black majik here. :-)

                                            digital man

Synchronet "Real Fact" #64:
Synchronet PCMS (introduced w/v2.0) is Programmable Command and Menu Structure.
Norco, CA WX: 77.6øF, 57.0% humidity, 8 mph ENE wind, 0.00 inches rain/24hrs
--- SBBSecho 3.05-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
SEEN-BY: 103/705 154/10 203/0 218/700 221/0 1 6 360 229/426 240/5832 280/464
SEEN-BY: 280/5003 292/854 320/219 423/120 633/267 280 640/384 1384 712/620 848
SEEN-BY: 770/1 2320/100
@PATH: 103/705 280/464 221/0 640/1384 384 712/848 633/267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.