TIP: Click on subject to list as thread! ANSI
echo: muffin
to: All
from: Mike Tripp
date: 2003-05-31 14:32:54
subject: Dupe checking

=============================================================================
* Forwarded by Mike Tripp (1:382/61)
* Area : SCOTT_DUDLEY (SCOTT_DUDLEY)
* From : Scott Dudley, 1:249/106.1 (03 Jul 97 00:39)
* To   : John Gardeniers
* Subj : Dupe checking
=============================================================================
 JG> I know it's purely academic now as far as Squish itself is concerned but
 JG> I'd be very interested in exactly what is, and is not, compared when dupe
 JG> checking. More detailed knowledge would also provide us all with a way to
 JG> make a more considered decision regarding what setting to use.

The exact algorithm is a little complicated.  Don't say that you didn't ask
for it. :)

If the following test passes, the message is a dupe:

    if ((fCheckHeader && did.crc==dptr->crc &&
did.date==dptr->date) ||
        (fCheckMsgid && did.msgid_hash &&
did.msgid_hash==dptr->msgid_hash &&
        did.msgid_serial==dptr->msgid_serial))

The "crc" is the hash of the message header, and did.date is the
message date. The message header CRC is computed as follows:

====

static void near GetDidHeader(DUPEID *pid, PXMSG msg)
{
  char temp[PATHLEN];
  int subjsize;
  byte *p;

  pid->crc=(dword)0xffffffffLu;

  /* CRC the first two words of the to/from fields */

  pid->crc=crc2word(msg->from, pid->crc);
  pid->crc=crc2word(msg->to, pid->crc);

  /* Figure out how much of the subject line needs to be checked */

  subjsize = (config.flag2 & FLAG2_LONGHDR) ? XMSG_SUBJ_SIZE : 23;

  (void)strncpy(temp, msg->subj, subjsize);
  temp[subjsize]=0;
  (void)strlwr(temp);

  /* Remove any 're:" prefixes */

  while (temp[0]=='r' && temp[1]=='e' && temp[2]==':'
&& temp[3]==' ')
    (void)memmove(temp, temp+4, strlen(temp+4)+1);

  for (p=temp; *p; p++)
    if (*p != ' ')
      pid->crc=xcrc32(*p, pid->crc);

  /* Now copy in the message's date.  If it's a valid date (year != 0),     *
   * then simply make a copy of the 4-byte stamp.  Otherwise, hash the      *
   * ASCII date.                                                            */

  if (msg->date_written.date.yr == 0)
    pid->date=crcstr((dword)0xffffffffLu, msg->__ftsc_date);
  else
  {
    char *date=msg->__ftsc_date;
    int ch;

    pid->date=*(dword *)(char *)&msg->date_written;

    /* Handle messages with a one-second granularity */

    ch=date[strlen(date)-1];

    if (ch >= '0' && ch <= '9')
      if (((ch-'0') & 1) != 0)
        pid->date=~pid->date;
  }
}

====

The "msgid_hash" is the hash of the MSGID address, and
msgid_serial is the 8-digit serial number in the MSGID kludge.


The hash of the address is computed as follows:

====

/* Fill in the part of the message header that pertains to the              *
 * MSGID kludge.                                                            */

void GetDidMsgid(DUPEID *pid, char *ctrl)
{
  char *msgid=MsgGetCtrlToken(ctrl, msgid_str);

  /* If there was no MSGID kludge, zero out the msgid lines and do nothing */

  if (!msgid)
  {
    pid->msgid_hash=0L;
    pid->msgid_serial=0L;
    return;
  }

  MashMsgid(msgid+7, &pid->msgid_hash, &pid->msgid_serial);
  MsgFreeCtrlToken(msgid);
}


void MashMsgid(char *begin, dword *msgid_hash, dword *msgid_serial)
{
  char hash_buf[PATHLEN];
  size_t maxsize;
  char *end;

  /*  012345678                                               */
  /* ^aMSGID: 1:249/106 12345678                              */
  /*                                                          */
  /* ^aMSGID: "dudleys f106 n249 z1 fidonet org" 12345678     */

  end=begin;

  /* If we got a quote, skip over until the next quote is found */

  if (*begin=='\"')
  {
    for (end=begin+1; *end; end++)
      if (*end=='"')
        if (*++end != '"')
          break;
  }
  else
  {
    /* Else just skip until the next space */

    while (*end && *end != ' ')
      end++;
  }

  maxsize=min(PATHLEN-1, (size_t)(end-begin));

  strncpy(hash_buf, begin, maxsize);
  hash_buf[(size_t)maxsize]=0;
  *msgid_hash=SquishHash(hash_buf);

  /* Skip over the spaces */

  while (*end==' ')
    end++;

  /* Make sure that the hex ID is read in correctly */

  if (sscanf(end, "%08lx", msgid_serial) != 1)
  {
    *msgid_serial=*msgid_hash=0L;
    return;
  }
}

====

That's about it.  You can't say that I wasn't specific.  :)

-+-
 + Origin: Fowl Weather Bureaucropost (1:249/106.1)
=============================================================================

Hello All!

After diggin through the archives, an FYI on the topic of dupechecking. 
Looks like 1.10 improved to 1 sec granularity.

.\\ike

--- GoldED 2.50+
* Origin: -=( The TechnoDrome )=- Austin,TX 512-327-8598 33.6k (1:382/61)
SEEN-BY: 633/267 270
@PATH: 382/61 140/1 106/2000 633/267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.