Bloodgate.com - spam statistics

Table of Contents

Introduction

These pages describe my attempts to visualize the amount of spam I get. Spam annoys me really - especially because I seem to drown more and more in it. Every month I think it is bad, and the next month I get even more.
The worst part seems to be unable to do something about:

In short: it is a major pain in the lower bottom and if I ever meet a hardcore spammer personally, I will do very nasty things to him involving a pitchfork, a location in a far-away, very remote spot, some chains and my archive in printed form (large letters with lot's of spacing).

I used to clean the archives of non-spam mail that got accidentily filtered as spam on a some-weekly basis. But lately I get so much crap, that I can no longer do this just for the time this would take. So there might be some non-spams in the archive, but they are surely only a very small part.

I hope these pages show how bad the spam problem has gotten lately - this is actually true since a few years :-/

Roll your own stats

What you need is:

The Mail::Graph module includes documentation and a sample setup to show how to generate a page like mine.
Actually, it can also do stats about ordinary mail, or support mail, or anything else mail-related. Try it! ;)

Explanations

My mail server collects automatically every spam (junk mail, not to be confused with SPAM - the food) it receives, and archives it. To these archives I added all the spam in my local inbox (e.g. the one that were not filtered) by converting it to the same format my archives have.

All spam was sent to either my pobox.com account or to any address at bloodgate.com. There is only one person for this domain, but I had the habit to issue multiple dummy addresses as spam-traps. This was a bad decision, since it only let to the fact that I get most spam multiple times - for instance, to my two whois-addresses, my gimp developer address, and to my normal dummy address. I stopped this a long time ago, but the old addresses are still spammed into oblivion. Add to this the fact that the spammers sometimes outright guess addresses or mangle them (by dropping numbrs, for instance), and you know the reason why both the number of target addresses and the spam count are so high.

The archives date back until July 1998. From February to March 2000 and in August 2001 the filter was offline and the spam sloshed directly into my local inbox. It seems, however, that I lost some of the automatic archive in these time frames - it is hard to tell.
So please don't interfer anything from the unusual low spam counts in these two time frames.

I also lost the archive from 23.6 - 27.6. 2002 due to my clumsy file shuffling :-/
Fortunately I got a rolling backup of the last x hundred email messages and made a snapshot. Unfortunately, the messages are saved before the spam filtering, so I need to re-tag and extract them manually (well, scriptically). When this is done, the count for the last days in Juni will be corrected. What you see now are only the 1-2 messages per day that go around my filter and land in my mailbox directly.

Also, my pobox account generates an unusual high volume of spam, but for some time I had it just delete any spam instead of forwarding to my filter (and thus it didn't get archived). In short: the actual amount of spam is likely much higher for some timeframes.

The target domain stat is slightly wrong, since the filter at elstner.com address used to lump together everything I received before I refined my filter setup. In reality, the spam would go to my pobox.com account, or my dummy address, and then end up at the bloodgate.com system. It was then forwarded to filter at elstner.com, where it got filtered, and then stuffed back into my inbox or the spam archive, depending on the filter outcome. Since the graph tool can not yet figure this out, the stats are wrong. I already pre-processed some of my spams to add the correct X-Envelope-To: headers.

Update

Today (2002-07-30) I discovered that I had a problem in my spam archiving rules: any spam that was just deleted instead of being send back with an autogenerated error message was not archived in the normal spam archive. Instead only the header was archived in a separate file. The idea was to thwart mail bomb attacks, but I later extended this to other stuff, like not sending myself error messages, or answering to wrong return adresses etc. I just forgot that these spam headers exist.

I now just added them to the stats and the count did go up from 6376 to 8760 items processed. Ouch! You can see an old version here and compare it to today. You see that the trend is the same, only the absolut numbers are much more worse...

Update 2003-06-14

On 2003-05-31 blodgate.com went offline for about 14 days because the domain expired. In this timeframe, only spam to elstner.com (e.g. almost nothing) hit my server. So you will see a big drop in the scores. It is not yet clear whether the offline domain will have forced some spammers to remove the domain from their list - I think they'll never clean it...

Update 2003-11-01

The large gap you see from 06 .. 08 in 2003 is due to me accidentily deleting instead of archiving any spam going to an unknown target address. Oups :-/
But as you can see, the spammers are still at it, with the occosional virus or worm wave thrown in. There is enough spam for everyone!

Update 2004-06-09

Bad news when I returned from a 14-day vacation: I had 27005 new emails, beeing 133 Mbytes in size (and using up about 350 on my HD) :-(
Kmail took over 12 minutes just to get the list of mais from the server and over one hour to download them all, despite DLS (768 kbit/s).

Unfortunately, Kmail crashes just before finishing with an out-of-memory error, and upon redownloading the emails I discovered that it only deletes all mails after it has fetched them (I use pipelining, which means it can download about 10-15 mails per second, depending on size of the mail. W/o pipelining it makes about 0.6 mails per second...). This left me with the mails twice - and no easy way to remove the doubles! Ooops!

Updating KDE got me a brandnew Kmail version, which has a handy feature to delete all double mails :-)

At the moment I cannot do the spam stats at all, because the stats tool uses all memory and then crashes - I need to rewrite it to not keep everything in memory :/

Update 2004-06-19

Today I got 4783 mails for elstner.com alone - plus 320 for bloodgate.com. Probably the first day I hit over 5000/day :( Maybe even not the first time...
With about 17Mbyte spams today it is only a question of time when my daily spam exceeds what I can download in 24/hours per DSL. Luckily, my DSL line will be upgradeded to 1Mbit/s on January 2005...
It also occured to me today that the spam counts towards my space limit on my pair.com's accounts. I recently downgraded my plan to (cheaply) save 12$ a month because I don't need the space and transfer. However, with 20 Mbyte spam per day I fear for the next vacation (or just think what happens if I need suddenly go to a hospital, or my PC crashes etc). Every day the spam accumulates, and it is possible that I would have to pay for overdraft on my account! Sick world!

Windows & ActivePerl

It also works under Windows, much to my surprise.

Heed over to ActiveState.com and install their latest Perl (currenty 5.6.1 build 631).
You also need some modules. If you do not have an internet connection under windows, this is a bit tricky, but can be done. The basic problem is that installing modules under ActiveState Perl is done via ppm, and I couldn't get it to work with an offline repository.

Here are the two ways to install the necc. modules under Windows and both involve ppm, the ActiveState Package Manager:

If you don't have an online connection under Windows, but have wine, then try the following: (exchange the path below to your Perl installation under Windows):

cd /windows/D/perl/bin
wine perl ppm.bat

When under native windows, just type ppm at a DOS prompt window, or choose Start->Run and type ppm[Enter].

At the ppm prompt type the following:

install GDGraph
[lots of output will appear]
install MIME::Tools
[lots of output]
install Date::Calc
[lots of output again]
quit

Now you should have all modules needed by Mail::Graph. Fetch the Mail::Graph distribution as tar.gz file from CPAN and unzip it to some new directory. You don't need to build or install it, you can work from this directory right away.

The spam statistic along with the pictures should appear in the output directory.

If you have any questions, send me an mail.

Links

Here are some links to other pages using something else than Mail::Graph. If you want to be included, drop me a mail.

brian's mail graphs (not using Mail::Graph, but something homegrown)
http://mrtg.smux.net/mailmsgs.html (using MRTG)
Rich's spam archive w/ graphs (it's even bigger than mine)
Paul Wouters' spam archive (it shows nearly the same trend than mine)
The story of 'Nadine' [2002-04-23]
Angus' spam stats [2002-05-28]
SpamCop [2002-05-28]

Here are some links to other pages using Mail::Graph :-) If you want to be included, drop me a mail.

Chris Halverson's spam statistic page [2002-10-05]
cwie.net spam stats [2002-12-12]

Spam-related Links

Acknowledgements

I wish to thank all the spammers, who, by flooding my various mail-accounts with their fraudy, illegal, virus-ladden or script-ridden crap made this possible.
Without your constant tries to hammer through my filter, to find some open, unaware server to abuse and to hide, I wouldn't be able to make such fantastic statistics. Thank you.

Tels
Created: 2002-04-08
Last modified: 2003-06-14
Valid HTML 4.01!