| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | Dupe checking |
=============================================================================
* Forwarded by Mike Tripp (1:382/61)
* Area : SCOTT_DUDLEY (SCOTT_DUDLEY)
* From : Scott Dudley, 1:249/106.1 (03 Jul 97 00:39)
* To : John Gardeniers
* Subj : Dupe checking
=============================================================================
JG> I know it's purely academic now as far as Squish itself is concerned but
JG> I'd be very interested in exactly what is, and is not, compared when dupe
JG> checking. More detailed knowledge would also provide us all with a way to
JG> make a more considered decision regarding what setting to use.
The exact algorithm is a little complicated. Don't say that you didn't ask
for it. :)
If the following test passes, the message is a dupe:
if ((fCheckHeader && did.crc==dptr->crc &&
did.date==dptr->date) ||
(fCheckMsgid && did.msgid_hash &&
did.msgid_hash==dptr->msgid_hash &&
did.msgid_serial==dptr->msgid_serial))
The "crc" is the hash of the message header, and did.date is the
message date. The message header CRC is computed as follows:
====
static void near GetDidHeader(DUPEID *pid, PXMSG msg)
{
char temp[PATHLEN];
int subjsize;
byte *p;
pid->crc=(dword)0xffffffffLu;
/* CRC the first two words of the to/from fields */
pid->crc=crc2word(msg->from, pid->crc);
pid->crc=crc2word(msg->to, pid->crc);
/* Figure out how much of the subject line needs to be checked */
subjsize = (config.flag2 & FLAG2_LONGHDR) ? XMSG_SUBJ_SIZE : 23;
(void)strncpy(temp, msg->subj, subjsize);
temp[subjsize]=0;
(void)strlwr(temp);
/* Remove any 're:" prefixes */
while (temp[0]=='r' && temp[1]=='e' && temp[2]==':'
&& temp[3]==' ')
(void)memmove(temp, temp+4, strlen(temp+4)+1);
for (p=temp; *p; p++)
if (*p != ' ')
pid->crc=xcrc32(*p, pid->crc);
/* Now copy in the message's date. If it's a valid date (year != 0), *
* then simply make a copy of the 4-byte stamp. Otherwise, hash the *
* ASCII date. */
if (msg->date_written.date.yr == 0)
pid->date=crcstr((dword)0xffffffffLu, msg->__ftsc_date);
else
{
char *date=msg->__ftsc_date;
int ch;
pid->date=*(dword *)(char *)&msg->date_written;
/* Handle messages with a one-second granularity */
ch=date[strlen(date)-1];
if (ch >= '0' && ch <= '9')
if (((ch-'0') & 1) != 0)
pid->date=~pid->date;
}
}
====
The "msgid_hash" is the hash of the MSGID address, and
msgid_serial is the 8-digit serial number in the MSGID kludge.
The hash of the address is computed as follows:
====
/* Fill in the part of the message header that pertains to the *
* MSGID kludge. */
void GetDidMsgid(DUPEID *pid, char *ctrl)
{
char *msgid=MsgGetCtrlToken(ctrl, msgid_str);
/* If there was no MSGID kludge, zero out the msgid lines and do nothing */
if (!msgid)
{
pid->msgid_hash=0L;
pid->msgid_serial=0L;
return;
}
MashMsgid(msgid+7, &pid->msgid_hash, &pid->msgid_serial);
MsgFreeCtrlToken(msgid);
}
void MashMsgid(char *begin, dword *msgid_hash, dword *msgid_serial)
{
char hash_buf[PATHLEN];
size_t maxsize;
char *end;
/* 012345678 */
/* ^aMSGID: 1:249/106 12345678 */
/* */
/* ^aMSGID: "dudleys f106 n249 z1 fidonet org" 12345678 */
end=begin;
/* If we got a quote, skip over until the next quote is found */
if (*begin=='\"')
{
for (end=begin+1; *end; end++)
if (*end=='"')
if (*++end != '"')
break;
}
else
{
/* Else just skip until the next space */
while (*end && *end != ' ')
end++;
}
maxsize=min(PATHLEN-1, (size_t)(end-begin));
strncpy(hash_buf, begin, maxsize);
hash_buf[(size_t)maxsize]=0;
*msgid_hash=SquishHash(hash_buf);
/* Skip over the spaces */
while (*end==' ')
end++;
/* Make sure that the hex ID is read in correctly */
if (sscanf(end, "%08lx", msgid_serial) != 1)
{
*msgid_serial=*msgid_hash=0L;
return;
}
}
====
That's about it. You can't say that I wasn't specific. :)
-+-
+ Origin: Fowl Weather Bureaucropost (1:249/106.1)
=============================================================================
Hello All!
After diggin through the archives, an FYI on the topic of dupechecking.
Looks like 1.10 improved to 1 sec granularity.
.\\ike
--- GoldED 2.50+
* Origin: -=( The TechnoDrome )=- Austin,TX 512-327-8598 33.6k (1:382/61)SEEN-BY: 633/267 270 @PATH: 382/61 140/1 106/2000 633/267 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.