TIP: Click on subject to list as thread! ANSI
echo: bbs_internet
to: all
from: mark lewis
date: 2002-12-17 23:52:48
subject: a method of controlling spam...

the following is from paul graham's "A Plan for Spam" web site...
what paul describes is known as Bayesian Classification and is basically a
statistical analysis of the contents of messages...

http://www.paulgraham.com/spam.html

warning! this article (and thus this message) is over 600 lines in length...

==========

August 2002

(This article describes the spam-filtering techniques used in the new
spamproof web-based mail reader we're building to exercise Arc [LINK].)

I think it's possible to stop spam, and that content-based filters are the
way to do it. The Achilles heel of the spammers is their message. They can
circumvent any other barrier you set up. They have so far, at least. But
they have to deliver their message, whatever it is. If we can write
software that recognizes their messages, there is no way they can get
around that.

_ _ _


To the recipient, spam is easily recognizable. If you hired someone to read
your mail and discard the spam, they would have little trouble doing it.
How much do we have to do, short of AI, to automate this process?

I think we will be able to solve the problem with fairly simple algorithms.
In fact, I've found that you can filter present-day spam acceptably well
using nothing more than a Bayesian combination of the spam probabilities of
individual words. Using a slightly tweaked (as described below) Bayesian
filter, we now miss less than 5 per 1000 spams, with 0 false positives.

The statistical approach is not usually the first one people try when they
write spam filters. Most hackers' first instinct is to try to write
software that recognizes individual properties of spam. You look at spams
and you think, the gall of these guys to try sending me mail that begins
"Dear Friend" or has a subject line that's all uppercase and ends
in eight exclamation points. I can filter out that stuff with about one
line of code.

And so you do, and in the beginning it works. A few simple rules will take
a big bite out of your incoming spam. Merely looking for the word
"click" will catch 79.7% of the emails in my spam corpus, with
only 1.2% false positives.

I spent about six months writing software that looked for individual spam
features before I tried the statistical approach. What I found was that
recognizing that last few percent of spams got very hard, and that as I
made the filters stricter I got more false positives.

False positives are innocent emails that get mistakenly identified as
spams. For most users, missing legitimate email is an order of magnitude
worse than receiving spam, so a filter that yields false positives is like
an acne cure that carries a risk of death to the patient.

The more spam a user gets, the less likely he'll be to notice one innocent
mail sitting in his spam folder. And strangely enough, the better your spam
filters get, the more dangerous false positives become, because when the
filters are really good, users will be more likely to ignore everything
they catch.

I don't know why I avoided trying the statistical approach for so long. I
think it was because I got addicted to trying to identify spam features
myself, as if I were playing some kind of competitive game with the
spammers. (Nonhackers don't often realize this, but most hackers are very
competitive.) When I did try statistical analysis, I found immediately that
it was much cleverer than I had been. It discovered, of course, that terms
like "virtumundo" and "teens" were good indicators of
spam. But it also discovered that "per" and "FL" and
"ff0000" are good indicators of spam. In fact, "ff0000"
(html for bright red) turns out to be as good an indicator of spam as any
pornographic term.

_ _ _


Here's a sketch of how I do statistical filtering. I start with one corpus
of spam and one of nonspam mail. At the moment each one has about 4000
messages in it. I scan the entire text, including headers and embedded html
and javascript, of each message in each corpus. I currently consider
alphanumeric characters, dashes, apostrophes, and dollar signs to be part
of tokens, and everything else to be a token separator. (There is probably
room for improvement here.) I ignore tokens that are all digits, and I also
ignore html comments, not even considering them as token separators.

I count the number of times each token (ignoring case, currently) occurs in
each corpus. At this stage I end up with two large hash tables, one for
each corpus, mapping tokens to number of occurrences.

Next I create a third hash table, this time mapping each token to the
probability that an email containing it is a spam, which I calculate as
follows [1]:

(let ((g (* 2 (or (gethash word good) 0)))
      (b (or (gethash word bad) 0)))
   (unless (< (+ g b) 5)
     (max .01
          (min .99 (float (/ (min 1 (/ b nbad))
                             (+ (min 1 (/ g ngood))
                                (min 1 (/ b nbad)))))))))

where word is the token whose probability we're calculating, good and bad
are the hash tables I created in the first step, and ngood and nbad are
number of nonspam and spam messages respectively.

I explained this as code to show a couple of important details. I want to
bias the probabilities slightly to avoid false positives, and by trial and
error I've found that a good way to do it is to double all the numbers in
good. This helps to distinguish between words that occasionally do occur in
legitimate email and words that almost never do. I only consider words that
occur more than five times in total (actually, because of the doubling,
occurring three times in nonspam mail would be enough). And then there is
the question of what probability to assign to words that occur in one
corpus but not the other. Again by trial and error I chose .01 and .99.
There may be room for tuning here, but as the corpus grows such tuning will
happen automatically anyway.

The especially observant will notice that while I consider each corpus to
be a single long stream of text for purposes of counting occurrences, I use
the number of emails in each, rather than their combined length, as the
divisor in calculating spam probabilities. This adds another slight bias to
protect against false positives.

When new mail arrives, it is scanned into tokens, and the most interesting
fifteen tokens, where interesting is measured by how far their spam
probability is from a neutral .5, are used to calculate the probability
that the mail is spam. If probs is a list of the fifteen individual
probabilities, you calculate the combined [LINK] probability thus:

(let ((prod (apply #'* probs)))
  (/ prod (+ prod (apply #'* (mapcar #'(lambda (x) (- 1 x))
                                     probs)))))

One question that arises in practice is what probability to assign to a
word you've never seen, i.e. one that doesn't occur in the hash table of
word probabilities. I've found, again by trial and error, that .4 is a good
number to use. If you've never seen a word before, it is probably fairly
innocent; spam words tend to be all too familiar.

There are examples of this algorithm being applied to actual emails in an
appendix at the end.

I treat mail as spam if the algorithm above gives it a probability of more
than .9 of being spam. But in practice it would not matter much where I put
this threshold, because few probabilities end up in the middle of the
range.

_ _ _


One great advantage of the statistical approach is that you don't have to
read so many spams. Over the past six months, I've read literally thousands
of spams, and it is really kind of demoralizing. Norbert Wiener said if you
compete with slaves you become a slave, and there is something similarly
degrading about competing with spammers. To recognize individual spam
features you have to try to get into the mind of the spammer, and frankly I
want to spend as little time inside the minds of spammers as possible.

But the real advantage of the Bayesian approach, of course, is that you
know what you're measuring. Feature-recognizing filters like SpamAssassin
assign a spam "score" to email. The Bayesian approach assigns an
actual probability. The problem with a "score" is that no one
knows what it means. The user doesn't know what it means, but worse still,
neither does the developer of the filter. How many points should an email
get for having the word "sex" in it? A probability can of course
be mistaken, but there is little ambiguity about what it means, or how
evidence should be combined to calculate it. Based on my corpus,
"sex" indicates a .97 probability of the containing email being a
spam, whereas "sexy" indicates .99 probability. And Bayes' Rule,
equally unambiguous, says that an email containing both words would, in the
(unlikely) absence of any other evidence, have a 99.97% chance of being a
spam.

Because it is measuring probabilities, the Bayesian approach considers all
the evidence in the email, both good and bad. Words that occur
disproportionately rarely in spam (like "though" or
"tonight" or "apparently") contribute as much to
decreasing the probability as bad words like "unsubscribe" and
"opt-in" do to increasing it. So an otherwise innocent email that
happens to include the word "sex" is not going to get tagged as
spam.

Ideally, of course, the probabilities should be calculated individually for
each user. I get a lot of email containing the word "Lisp", and
(so far) no spam that does. So a word like that is effectively a kind of
password for sending mail to me. In my earlier spam-filtering software, the
user could set up a list of such words and mail containing them would
automatically get past the filters. On my list I put words like
"Lisp" and also my zipcode, so that (otherwise rather
spammy-sounding) receipts from online orders would get through. I thought I
was being very clever, but I found that the Bayesian filter did the same
thing for me, and moreover discovered of a lot of words I hadn't thought
of.

When I said at the start that our filters let through less than 5 spams per
1000 with 0 false positives, I'm talking about filtering my mail based on a
corpus of my mail. But these numbers are not misleading, because that is
the approach I'm advocating: filter each user's mail based on the spam and
nonspam mail he receives. Essentially, each user should have two delete
buttons, ordinary delete and delete-as-spam. Anything deleted as spam goes
into the spam corpus, and everything else goes into the nonspam corpus.

You could start users with a seed filter, but ultimately each user should
have his own per-word probabilities based on the actual mail he receives.
This (a) makes the filters more effective, (b) lets each user decide their
own precise definition of spam, and (c) perhaps best of all makes it hard
for spammers to tune mails to get through the filters. If a lot of the
brain of the filter is in the individual databases, then merely tuning
spams to get through the seed filters won't guarantee anything about how
well they'll get through individual users' varying and much more trained
filters.

Content-based spam filtering is often combined with a whitelist, a list of
senders whose mail can be accepted with no filtering. One easy way to build
such a whitelist is to keep a list of every address the user has ever sent
mail to. If a mail reader has a delete-as-spam button then you could also
add the from address of every email the user has deleted as ordinary trash.

I'm an advocate of whitelists, but more as a way to save computation than
as a way to improve filtering. I used to think that whitelists would make
filtering easier, because you'd only have to filter email from people you'd
never heard from, and someone sending you mail for the first time is
constrained by convention in what they can say to you. Someone you already
know might send you an email talking about sex, but someone sending you
mail for the first time would not be likely to. The problem is, people can
have more than one email address, so a new from-address doesn't guarantee
that the sender is writing to you for the first time. It is not unusual for
an old friend (especially if he is a hacker) to suddenly send you an email
with a new from-address, so you can't risk false positives by filtering
mail from unknown addresses especially stringently.

In a sense, though, my filters do themselves embody a kind of whitelist
(and blacklist) because they are based on entire messages, including the
headers. So to that extent they "know" the email addresses of
trusted senders and even the routes by which mail gets from them to me. And
they know the same about spam, including the server names, mailer versions,
and protocols.

_ _ _


If I thought that I could keep up current rates of spam filtering, I would
consider this problem solved. But it doesn't mean much to be able to filter
out most present-day spam, because spam evolves. Indeed, most antispam
techniques [LINK] so far have been like pesticides that do nothing more
than create a new, resistant strain of bugs.

I'm more hopeful about Bayesian filters, because they evolve with the spam.
So as spammers start using "c0ck" instead of "cock" to
evade simple-minded spam filters based on individual words, Bayesian
filters automatically notice. Indeed, "c0ck" is far more damning
evidence than "cock", and Bayesian filters know precisely how
much more.

Still, anyone who proposes a plan for spam filtering has to be able to
answer the question: if the spammers knew exactly what you were doing, how
well could they get past you? For example, I think that if checksum-based
spam filtering becomes a serious obstacle, the spammers will just switch to
mad-lib techniques for generating message bodies.

To beat Bayesian filters, it would not be enough for spammers to make their
emails unique or to stop using individual naughty words. They'd have to
make their mails indistinguishable from your ordinary mail. And this I
think would severely constrain them. Spam is mostly sales pitches, so
unless your regular mail is all sales pitches, spams will inevitably have a
different character. And the spammers would also, of course, have to change
(and keep changing) their whole infrastructure, because otherwise the
headers would look as bad to the Bayesian filters as ever, no matter what
they did to the message body. I don't know enough about the infrastructure
that spammers use to know how hard it would be to make the headers look
innocent, but my guess is that it would be even harder than making the
message look innocent.

Assuming they could solve the problem of the headers, the spam of the
future will probably look something like this:

  Hey there. Thought you should check out the following:
  http://www.27meg.com/foo

because that is about as much sales pitch as content-based filtering will
leave the spammer room to make. (Indeed, it will be hard even to get this
past filters, because if everything else in the email is neutral, the spam
probability will hinge on the url, and it will take some effort to make
that look neutral.)

Spammers range from businesses running so-called opt-in lists who don't
even try to conceal their identities, to guys who hijack mail servers to
send out spams promoting porn sites. If we use filtering to whittle their
options down to mails like the one above, that should pretty much put the
spammers on the "legitimate" end of the spectrum out of business;
they feel obliged by various state laws to include boilerplate about why
their spam is not spam, and how to cancel your "subscription,"
and that kind of text is easy to recognize.

(I used to think it was naive to believe that stricter laws would decrease
spam. Now I think that while stricter laws may not decrease the amount of
spam that spammers send, they can certainly help filters to decrease the
amount of spam that recipients actually see.)

All along the spectrum, if you restrict the sales pitches spammers can
make, you will inevitably tend to put them out of business. That word
business is an important one to remember. The spammers are businessmen.
They send spam because it works. It works because although the response
rate is abominably low (at best 15 per million, vs 3000 per million for a
catalog mailing), the cost, to them, is practically nothing. The cost is
enormous for the recipients, about 5 man-weeks for each million recipients
who spend a second to delete the spam, but the spammer doesn't have to pay
that.

Sending spam does cost the spammer something, though. [2] So the lower we
can get the response rate-- whether by filtering, or by using filters to
force spammers to dilute their pitches-- the fewer businesses will find it
worth their while to send spam.

The reason the spammers use the kinds of sales pitches [LINK] that they do
is to increase response rates. This is possibly even more disgusting than
getting inside the mind of a spammer, but let's take a quick look inside
the mind of someone who responds to a spam. This person is either
astonishingly credulous or deeply in denial about their sexual interests.
In either case, repulsive or idiotic as the spam seems to us, it is
exciting to them. The spammers wouldn't say these things if they didn't
sound exciting. And "thought you should check out the following"
is just not going to have nearly the pull with the spam recipient as the
kinds of things that spammers say now. Result: if it can't contain exciting
sales pitches, spam becomes less effective as a marketing vehicle, and
fewer businesses want to use it.

That is the big win in the end. I started writing spam filtering software
because I didn't want have to look at the stuff anymore. But if we get good
enough at filtering out spam, it will stop working, and the spammers will
actually stop sending it.

_ _ _


Of all the approaches to fighting spam, from software to laws, I believe
Bayesian filtering will be the single most effective. But I also think that
the more different kinds of antispam efforts we undertake, the better,
because any measure that constrains spammers will tend to make filtering
easier. And even within the world of content-based filtering, I think it
will be a good thing if there are many different kinds of software being
used simultaneously. The more different filters there are, the harder it
will be for spammers to tune spams to get through them.



Appendix: Examples of Filtering

Here [LINK] is an example of a spam that arrived while I was writing this
article. The fifteen most interesting words in this spam are:

  qvp0045
  indira
  mx-05
  intimail
  $7500
  freeyankeedom
  cdo
  bluefoxmedia
  jpg
  unsecured
  platinum
  3d0
  qves
  7c5
  7c266675

The words are a mix of stuff from the headers and from the message body,
which is typical of spam. Also typical of spam is that every one of these
words has a spam probability, in my database, of .99. In fact there are
more than fifteen words with probabilities of .99, and these are just the
first fifteen seen.

Unfortunately that makes this email a boring example of the use of Bayes'
Rule. To see an interesting variety of probabilities we have to look at
this [LINK] actually quite atypical spam.

The fifteen most interesting words in this spam, with their probabilities, are:

  madam             0.99
  promotion         0.99
  republic          0.99
  shortest          0.047225013
  mandatory         0.047225013
  standardization   0.07347802
  sorry             0.08221981
  supported         0.09019077
  people's          0.09019077
  enter             0.9075001
  quality           0.8921298
  organization      0.12454646
  investment        0.8568143
  very              0.14758544
  valuable          0.82347786

This time the evidence is a mix of good and bad. A word like
"shortest" is almost as much evidence for innocence as a word
like "madam" or "promotion" is for guilt. But still the
case for guilt is stronger. If you combine these numbers according to
Bayes' Rule, the resulting probability is .9027.

"Madam" is obviously from spams beginning "Dear Sir or
Madam." They're not very common, but the word "madam" never
occurs in my legitimate email, and it's all about the ratio.

"Republic" scores high because it often shows up in Nigerian scam
emails, and also occurs once or twice in spams referring to Korea and South
Africa. You might say that it's an accident that it thus helps identify
this spam. But I've found when examining spam probabilities that there are
a lot of these accidents, and they have an uncanny tendency to push things
in the right direction rather than the wrong one. In this case, it is not
entirely a coincidence that the word "Republic" occurs in
Nigerian scam emails and this spam. There is a whole class of dubious
business propositions involving less developed countries, and these in turn
are more likely to have names that specify explicitly (because they aren't)
that they are republics.[3]

On the other hand, "enter" is a genuine miss. It occurs mostly in
unsubscribe instructions, but here is used in a completely innocent way.
Fortunately the statistical approach is fairly robust, and can tolerate
quite a lot of misses before the results start to be thrown off.

For comparison, here [LINK] is an example of that rare bird, a spam that
gets through the filters. Why? Because by sheer chance it happens to be
loaded with words that occur in my actual email:

  perl              0.01
  python            0.01
  tcl               0.01
  scripting         0.01
  morris            0.01
  graham            0.01491078
  guarantee         0.9762507
  cgi               0.9734398
  paul              0.027040077
  quite             0.030676773
  pop3              0.042199217
  various           0.06080265
  prices            0.9359873
  managed           0.06451222
  difficult         0.071706355

There are a couple pieces of good news here. First, this mail probably
wouldn't get through the filters of someone who didn't happen to specialize
in programming languages and have a good friend called Morris. For the
average user, all the top five words here would be neutral and would not
contribute to the spam probability.

Second, I think filtering based on word pairs (see below) might well catch
this one: "cost effective", "setup fee", "money
back" -- pretty incriminating stuff. And of course if they continued
to spam me (or a network I was part of), "Hostex" itself would be
recognized as a spam term.

Finally, here [LINK] is an innocent email. Its fifteen most interesting
words are as follows:

  continuation      0.01
  describe          0.01
  continuations     0.01
  example           0.033600237
  programming       0.05214485
  i'm               0.055427782
  examples          0.07972858
  color             0.9189189
  localhost         0.09883721
  hi                0.116539136
  california        0.84421706
  same              0.15981844
  spot              0.1654587
  us-ascii          0.16804294
  what              0.19212411

Most of the words here indicate the mail is an innocent one. There are two
bad smelling words, "color" (spammers love colored fonts) and
"California" (which occurs in testimonials and also in menus in
forms), but they are not enough to outweigh obviously innocent words like
"continuation" and "example".

It's interesting that "describe" rates as so thoroughly innocent.
It hasn't occurred in a single one of my 4000 spams. The data turns out to
be full of such surprises. One of the things you learn when you analyze
spam texts is how narrow a subset of the language spammers operate in. It's
that fact, together with the equally characteristic vocabulary of any
individual user's mail, that makes Bayesian filtering a good bet.



Appendix: More Ideas

One idea that I haven't tried yet is to filter based on word pairs, or even
triples, rather than individual words. This should yield a much sharper
estimate of the probability. For example, in my current database, the word
"offers" has a probability of .96. If you based the probabilities
on word pairs, you'd end up with "special offers" and
"valuable offers" having probabilities of .99 and, say,
"approach offers" (as in "this approach offers") having
a probability of .1 or less.

The reason I haven't done this is that filtering based on individual words
already works so well. But it does mean that there is room to tighten the
filters if spam gets harder to detect. (Curiously, a filter based on word
pairs would be in effect a Markov-chaining text generator running in
reverse.)

Specific spam features (e.g. not seeing the recipient's address in the to:
field) do of course have value in recognizing spam. They can be considered
in this algorithm by treating them as virtual words. I'll probably do this
in future versions, at least for a handful of the most egregious spam
indicators. Feature-recognizing spam filters are right in many details;
what they lack is an overall discipline for combining evidence.

Recognizing nonspam features may be more important than recognizing spam
features. False positives are such a worry that they demand extraordinary
measures. I will probably in future versions add a second level of testing
designed specifically to avoid false positives. If a mail triggers this
second level of filters it will be accepted even if its spam probability is
above the threshold.

I don't expect this second level of filtering to be Bayesian. It will
inevitably be not only ad hoc, but based on guesses, because the number of
false positives will not tend to be large enough to notice patterns. (It is
just as well, anyway, if a backup system doesn't rely on the same
technology as the primary system.)

Another thing I may try in the future is to focus extra attention on
specific parts of the email. For example, about 95% of current spam
includes the url of a site they want you to visit. (The remaining 5% want
you to call a phone number, reply by email or to a US mail address, or in a
few cases to buy a certain stock.) The url is in such cases practically
enough by itself to determine whether the email is spam.

Domain names differ from the rest of the text in a (non-German) email in
that they often consist of several words stuck together. Though
computationally expensive in the general case, it might be worth trying to
decompose them. If a filter has never seen the token "xxxporn"
before it will have an individual spam probability of .4, whereas
"xxx" and "porn" individually have probabilities (in my
corpus) of .9889 and .99 respectively, and a combined probability of .9998.

I expect decomposing domain names to become more important as spammers are
gradually forced to stop using incriminating words in the text of their
messages. (A url with an ip address is of course an extremely incriminating
sign, except in the mail of a few sysadmins.)

It might be a good idea to have a cooperatively maintained list of urls
promoted by spammers. We'd need a trust metric of the type studied by Raph
Levien to prevent malicious or incompetent submissions, but if we had such
a thing it would provide a boost to any filtering software. It would also
be a convenient basis for boycotts.

Another way to test dubious urls would be to send out a crawler to look at
the site before the user looked at the email mentioning it. You could use a
Bayesian filter to rate the site just as you would an email, and whatever
was found on the site could be included in calculating the probability of
the email being a spam. A url that led to a redirect would of course be
especially suspicious.

One cooperative project that I think really would be a good idea would be
to accumulate a giant corpus of spam. A large, clean corpus is the key to
making Bayesian filtering work well. Bayesian filters could actually use
the corpus as input. But such a corpus would be useful for other kinds of
filters too, because it could be used to test them.

Creating such a corpus poses some technical problems. We'd need trust
metrics to prevent malicious or incompetent submissions, of course. We'd
also need ways of erasing personal information (not just to-addresses and
ccs, but also e.g. the arguments to unsubscribe urls, which often encode
the to-address) from mails in the corpus. If anyone wants to take on this
project, it would be a good thing for the world.



Appendix: Defining Spam

I think there is a rough consensus on what spam is, but it would be useful
to have an explicit definition. We'll need to do this if we want to
establish a central corpus of spam, or even to compare spam filtering rates
meaningfully.

To start with, spam is not unsolicited commercial email. If someone in my
neighborhood heard that I was looking for an old Raleigh three-speed in
good condition, and sent me an email offering to sell me one, I'd be
delighted, and yet this email would be both commercial and unsolicited. The
defining feature of spam (in fact, its raison d'etre) is not that it is
unsolicited, but that it is automated.

It is merely incidental, too, that spam is usually commercial. If someone
started sending mass email to support some political cause, for example, it
would be just as much spam as email promoting a porn site.

I propose we define spam as unsolicited automated email. This definition
thus includes some email that many legal definitions of spam don't. Legal
definitions of spam, influenced presumably by lobbyists, tend to exclude
mail sent by companies that have an "existing relationship" with
the recipient. But buying something from a company, for example, does not
imply that you have solicited ongoing email from them. If I order something
from an online store, and they then send me a stream of spam, it's still
spam.

Companies sending spam often give you a way to "unsubscribe," or
ask you to go to their site and change your "account preferences"
if you want to stop getting spam. This is not enough to stop the mail from
being spam. Not opting out is not the same as opting in. Unless the
recipient explicitly checked a clearly labelled box (whose default was no)
asking to receive the email, then it is spam.

In some business relationships, you do implicitly solicit certain kinds of
mail. When you order online, I think you implicitly solicit a receipt, and
notification when the order ships. I don't mind when Verisign sends me mail
warning that a domain name is about to expire (at least, if they are the
actual registrar [LINK] for it). But when Verisign sends me email offering
a FREE Guide to Building My E-Commerce Web Site, that's spam.[4]



Notes:

[1] The examples in this article are translated into Common Lisp for,
believe it or not, greater accessibility. The application described here is
one that we wrote in order to test a new Lisp dialect called Arc [LINK]
that is not yet released.

[2] Currently the lowest rate seems to be about $200 to send a million
spams. That's very cheap, 1/50th of a cent per spam. But filtering out 95%
of spam, for example, would increase the spammers' cost to reach a given
audience by a factor of 20. Few can have margins big enough to absorb that.

[3] As a rule of thumb, the more qualifiers there are before the name of a
country, the more corrupt the rulers. A country called The Socialist
People's Democratic Republic of X is probably the last place in the world
you'd want to live.

[4] I've been gradually transferring all my domains from Verisign to
EasyDNS, and I am very happy with them. They're cheaper, much more
responsive, their site works, and they have never spammed me.

Thanks to Sarah Harlin for reading drafts of this; Daniel Giffin (who is
also writing the production Arc interpreter) for several good ideas about
filtering and for creating our mail infrastructure; Robert Morris, Trevor
Blackwell and Erann Gat for many discussions about spam; Raph Levien for
advice about trust metrics; and Chip Coldwell and Sam Steingold for advice
about statistics.

* Origin: (1:3634/12)
SEEN-BY: 633/267 270
@PATH: 3634/12 106/2000 1 379/1 633/267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.