Why the SpamCop blocking list is harmful

The information contained in this document is accurate to the best of our knowledge, as of Feb 21 2003. However this is a complex and rapidly changing area - there may be errors that we have missed. If you find any factual errors, or you have any comments, please email <spamcopfeedback AT fastmail DOT fm>.

Changelog

18 Aug 2004

It's been over a year since I wrote this, and many details have changed. Spamcop has been acquired by IronPort, although Julian Haight still controls the Spamcop BL. Spamcop no longer provide the detail of spam and non-spam reports and their scores, so the process is now even less transparent than when I first wrote this. Overall, the reasons I provide to avoid blocking mail on the basis of the Spamcop BL have not changed - you may well block many legitimate emails. However, using Spamcop BL as part of a score - for instance using SpamAssassin - and allowing your users the option to filter using this score, is a very effective technique
Scoring systems are a great approach to filtering spam - for example as used in FastMail.FM's spam filtering system. No IP databases are currently designed to provide the best possible input to such systems - so, we're creating our own! We are developing this system at the moment and will provide more information when it is available

22 Feb 2003

Julian Haight of Spamcop has provided this response.
Mr Haight's response contained numerous inaccuracies and incorrect assumptions; these issues are clarified in this reply from Jeremy Howard
Fix typo: The date was shown as Feb 21 2002; the correct date is Feb 21 2003. Sorry for the confusion!
Added clarification of implication of the 'FAQ/reality mismatch' in the Anti-competitive behaviour, and inaccurate statements section

Summary

This paper is by Jeremy Howard, a director of FastMail.FM.

SpamCop BL is a system that is widely used by email providers for blocking email which may be spam. However, providers using this service are blocking up to 10,000 legitimate emails sent to their own customers, for each spam they block. SpamCop BL is a system that is open to abuse, and can be very inaccurate. SpamCop BL advises on their home page that production sites should not use the service - email providers that ignore this advice are causing problems for their own customers, while wasting the time of mail abuse departments that would be better spent fighting spammers.

About SpamCop

SpamCop started out life as a marvelous spam reporting and notification service. I have commented numerous times on how useful and effective this service is.

Using the SpamCop notification service, users of any email system can easily report any email they receive. SpamCop then analyses the email, works out the likely true sender of the email, looks in the body to find any advertised email addresses or web sites, and then emails the administrators of all the systems involved to let them know about the problem. The SpamCop user is told who the notification will be sent to, and has the opportunity to remove any incorrectly targeted reports from the list. This service ensures that the right people know about abuse of their systems, and can quickly do something about it.

SpamCop quickly became very popular, and has leveraged its popularity to make profits from a for-fee email service, and from a "blocking list" which solicits donations.

About SpamCop BL

SpamCop BL is the "SpamCop Blocking List". It is a list of IP addresses, which are the numbers that uniquely define computers connected to the Internet. This is what the SpamCop BL page has to say

This blocking list is somewhat experimental and should not be used in a production environment where legitimate email must be delivered. It is growing more stable and is used by many large sites now. However, SpamCop is aggressive and often errs on the side of blocking mail - users should be warned and given information about how their mail is filtered.

Email providers that decide that they are not "a production environment where legitimate email must be delivered" can check against this list any time a mail server delivers them email. SpamCop BL then provides instructions on how to reject emails that come from listed servers. Email providers that follow these instructions then block all email from a listed host.

So, the big question is: "how does SpamCop BL decide what to list?". Read on...

The SpamCop BL listing algorithm

Unfortunately, SpamCop does not publish a current description of their algorithm anywhere. Therefore the only real way to find out what is listed, is to do some analysis as listings occur. SpamCop representatives have from time to time provided public information about the algorithm, and also maintain a page describing the algorithm. Unfortunately, the page has thus far not provided complete and accurate information, and the information provided by representatives has been inconsistent and sometimes incorrect. Another complexity is that the algorithm frequently changes.

Therefore, email providers that use SpamCop BL are stopping your customers from receiving emails based on criteria that you almost certainly do not fully understand.

We have attempted to reverse engineer the algorithm and this represents our current best understanding: (as of Feb 21 2003)

All SpamCop reports (see next section for a definition - a SpamCop report is not necessarily spam) connected with an IP (again, see next section - a server 'connected with' an item may not have anything to do with sending it) are counted up
Reports received in the last 6 hours score 4. Other reports score 1
This score is multiplied by the number of messages received by SpamTraps (see Errors section for possible problems with this count)
SpamCop attempts to track non-spam messages connected to that IP. Of course, SpamCop actually has no idea how many messages come from a server - they are sent directly from the sender to the recipient and do not go through SpamCop at all. Therefore instead SpamCop assigns "non-spam points" based on how many times that provider is looked up in the SpamCop BL, from certain sites
The "spam points" are divided by "non-spam points" to calculate a score. If the score is above 0.02, and a spam has been reported in the last 2 days, the server is listed.

SpamCop reports

A SpamCop report occurs when a SpamCop notification system user reports a message as spam, and SpamCop's analysis results in it being connected with a particular email server. At FastMail.FM we have seen the following result in SpamCop reports

Genuine spam is correctly connected with the correct sending server
Genuine spam is received by a user, who forwards it to another email provider (e.g. due to an automatic rule they have set up) or moves it with their email client manually, and is then reported from the final location. This can result in their own email provider getting listed because that looks like the "spam source", because that is the server through which it was forwarded
A virus results in an email being sent, which is incorrectly reported as spam
A SpamCop notification user sends someone a message which the recipient bounces (e.g. they are rejecting emails from that sender automatically), and the SpamCop user reports the bounce notification.

We have seen all of these sources of report, in roughly similar proportions. At FastMail.FM we observe on average about 30% of reports are correct. By way of example, here's a "spam" recently on file at SpamCop:

Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
MIME-Version: 1.0
X-Mailer: MIME::Lite 1.2  (F2.71; T1.001; A1.51; B2.12; Q2.03)
Date: Sun, 1 Dec 2002 22:54:55 UT
From: "Email Administrator" <>
Reply-To: "FastMail Administrator" <webmasterATfastmail.fm>
To: x
Subject: URGENT: FastMail usage warning
Message-Id: <2002_________________308F@www.fastmail.fm>
Return-Path: bounce@fastmail.fm

A user had gone over their quota, we had told them so, and they reported that as spam! You'll also note that SpamCop munges the user's address with the false report, but not the service provider's address (I changed the '@' to an 'AT' above), allowing the service provider's address to be harvested by spammers' "robot" software.

In theory users making incorrect reports are banned from SpamCop, but in practice we've seen users getting warnings rather than bans, and SpamCop has no way to stop them just signing up with a new name (including automated multiple signups, automated fraudulent abuse reports, etc).

Analysis of the listing algorithm

The listing algorithm, combined with the SpamCop reporting system (both described above), decide which servers are listed in the SpamCop BL. An analysis of the effectiveness of this algorithm for blocking email (the most appropriate analysis to use, given its name as a "blocking list") follows, in the categories of: accuracy, collateral damage, timeliness, and completeness.

Accuracy

Based on our experience, around 30-40% of spam reports targeted at our site are accurate. 100% of them result in SpamCop listings. A single report, whether accurate or not, will result in our site being blocked for 6 hours. That is because the report gets a score of 4 during the six hours after it is received. This is then divided by the number of 'good points'. SpamCop's collection of 'good points' is extremely ineffective - our site often has only around 200, despite our users sending and receiving millions of messages every day. So, bad=4 divided by good=200 = 0.02, which is at the listing threshold. So the site is listed for 6 hours.

Based on this, it would appear that on many occasions moderately large sites will get listed despite not being a source of any spam. Empirically, we have observed this behaviour. In general, the largest sites will have fewer problems, since their 'good score' will be large enough to avoid a single report causing a problem.

Collateral damage

From time to time, a spammer will sign up for a FastMail.FM account, and will send spam. FastMail.FM has automatic monitors that will lock the account as soon as 100 recipients (for free accounts) have been sent to, and appropriate authorities are then contacted to take action. One of these spam recipients is likely to then report this to SpamCop.

Our analysis of SpamCop's statistics and our internal systems has let us to estimate that around 2.5% of spam recipients end up reporting to SpamCop. We have also calculated that the SpamCop BL "good score" is around 0.01% of the number of messages actually sent from that server during that time.

Now, let's assume that an accurate report has been sent, and it is over 6 hours since the report was received (so it counts for 1, not 4, "bad points"). The threshold is 0.02. Now, we estimate that 97.5% of spams have not been reported, so the actual number of spams sent for each report is around 50. At the threshold of 0.02 there must be no more than 50 "good points", which (using the 0.01% statistic) represents 500,000 actual sent messages. Thus, the actual proportion of spam sent from a host at the threshold is around 50/500,000 or 0.01%.

That means that email providers using the SpamCop BL are blocking providers where for each one spam they are blocking, they are blocking 10,000 legitimate emails from reaching their own customers!

Timeliness

At FastMail.FM, a free email user is blocked as soon as the 100th email recipient is reached in an hour. So generally a spammer will be blocked within a few seconds (they try to send hundreds of thousands in a short time - anything less and it's not worth their while). A SpamCop user then logs into their email the next day, sees a spam, and reports it. SpamCop then lists FastMail.FM, despite the spammer being locked 24 hours earlier.

In this case the system fails to actually block spam sources in a manner that stops people receiving spam.

Completeness

There is a wide range of potential spam sources that SpamCop does not successfully block . One example is mail providers with many IP addresses over a range of different net-blocks. An example is Hotmail. Currently (as of Feb 21 2003) a Hotmail server is in SpamCop's blocking list - SpamCop provides the following listing details

64.4.22.193 listed in bl.spamcop.net.
Rationale: Recent spam increases spam score from 3.00 to 4.00: spam report ratio (0.038) exceeds threshold (0.020)

However, each email from Hotmail comes from a different sending server. For instance, yesterday FastMail.FM received email from 2,241 different Hotmail servers. So, if someone is unlucky enough to have their email sent through 64.4.22.193 right now, it will be blocked by users of SpamCop BL. Whereas if the reason for that block was really a spammer, that spammer's next message will probably go through a different server, and will not be blocked!

Potential for abuse

FastMail.FM is often the target of vengeful spammers. That's because we waste a lot of their time and money, for instance when they spend time setting up FastMail.FM accounts and find them locked almost immediately by our automated systems. They do everything they can to cause problems as a result, although most of the time our systems automatically identify and block whatever they try. One thing we have not been able to do much about, however, is a recent trick they have used to get us listed by SpamCop.

The spammers, we have discovered, send spam to under 100 people (thus defeating our locking system), but they send every spam to known anti-spam campaigners and SpamCop users! They know that almost all of the recipients will report to SpamCop, and get us listed. We know this has happened because when we have checked the send log on our server, we find the user in question has sent only one email, and that all the recipients are people that we know to be active in the anti-spam community.

Another example of abuse is reported here by a user at http://www.emaildiscussions.com :

I am listed on the abuse and technical whois records for our domain (A small government). Every day I have to fight with the idiots over at spamcop to keep us off their various forms of mailblocking strategies. We are a legit, spam-free government network, with extremely strict anti-spam policies. Hell, we have extremely strict private-use policies. Recently, a few people were sacked for abusing the mail system.

However, SpamCop insists on running a broken system that is totally open for abuse. Every loser with a grudge against the Government (i.e. everybody that pays tax) reports a mail to spamcop as being spam and as coming from us. The reporting mechanism is smooth, fast, free and has no consequences for the reporting user. The resulting work for us on the other hand is difficult, time consuming and very, very costly. We must take every report on spam or abuse seriously. So this involves tracking down the reported user, informing the manager, setting up a spot-audit team, going over to his/her PC, checking out the contents of the HDD, writing up reports, and the lot. This keeps about 4 people busy for a full working day.

We also have to work fast - i.e. drop what you are doing and respond to the issue. We have tried working with SpamCop on this issue, but to no avail. As far as we are concerned, in this case, the medicine is worse then the disease, and SpamCop is just another word for DDOS.

SpamCop should be shut down.

Anti-competitive behaviour, and inaccurate statements

SpamCop's policies regularly differ in practice to what they document. For example, when Politech was listed by SpamCop BL, Julian of SpamCop said "Since complaints from their spam-victims don't seem to have any effect, perhaps complaints from their paying users will! If rackspace does not take action to stop this source of spam, it is quite possible that other, innocent rackspace customers will be affected again." Although SpamCop regularly state that the primary reason for blocking other IPs is not collateral damage, their comments and behaviour has repeatedly shown that they do believe in this system.

Another example of FAQ/reality mismatch... The FAQ says spam is "

unsolicited (I didn't request it), and
automated (this same email was sent to thousands of people at once).

". When a FastMail.FM user sent information to 30 people with publicly listed addresses with information that they thought was useful to those people (so this message was somewhat solicited, and definitely not automated), we were listed. When I reported this, the SpamCop representative said that the message was regarded as spam because it was sent to more than one person, one of whom felt (by reporting) that it wasn't solicited. Whilst I can certainly empathise with a definition of Unsolicited Commericial Email which covers that type of situation, this is not the stated definition on the Spamcop BL site. Email providers that use the Spamcop BL should be aware that the basis of listings in the blocking list is not the same as the criteria made publicly available.