Adverts

Introduction 

This article will show you how to set up a Courier based mail system to use SpamAssassin. I will be using the latest version of SpamAssassin which is 3.1.7. If you have a SpamAssassin version 2.x there were some fairly large changes in the configuration files in the switch to 3.x so some of these instructions may not fit.

Getting Courier to Use SpamAssassin

Ensure that the package coutier-maildrop is installed, if it isn't install it. If you have the SMTP server installed this is almost certainly already present. If it doesn't exist already create the file /etc/courier/maildroprc and add the following lines to it:

import USER
if ($LOGNAME ne "")
{
xfilter "spamc -u $LOGNAME"
}
else
{
xfilter "spamc -u $USER"
}

Alternativly if you want to run SpamAssassin on a machine other than the local machine use the following code:

import USER
if ($LOGNAME ne "")
{
xfilter "spamc -d other.example.com -u $LOGNAME"
}
else
{
xfilter "spamc -d other.example.com -u $USER"
}

The if check here is because of some strange environment variable setting features of Courier where under some situations variables are set and under other situations they aren't. If you want to use a spam assassin installed on another machine (which can be useful because spam assassin is very CPU intensive) then you can specify a different host for spamc to connect to with the -d flag as shown above. Finally open the file /etc/courier/courierd and make the setting DEFAULTDELIVERY="| usr/bin/maildrop" and then restart courier-mta. This simply tells the SMTP server to use maildrop to place the mail in the actual folder rather than handle that task itself.

Setting Up SpamAssassin

Now enable SpamAssassin by flipping the ENABLED flag in /etc/default/spamassassin from 0 to 1

ENABLED=1

and then restarting SpamAssassin with /etc/init.d/spamassassin restart. By default it will spawn five child threads which should be enough for most small mail set ups. This should be all that is required to get a very basic SpamAssassin install configured with Courier. To test the system send yourself an email and watch the system log of the mail server. You should see messages like the following in your logs.

Jul  5 14:10:31 servername spamd[11527]: connection from localhost 
[127.0.0.1] at port 50294
Jul  5 14:10:31 servername spamd[11527]: info: setuid to 
username succeeded
Jul  5 14:10:31 servername spamd[11527]: processing message 
<200507051410.28606.username@example.com> for doozer:1000.
Jul  5 14:10:34 servername spamd[11527]: clean message (-2.8/5.0) for 
username:1000 in 3.1 seconds, 829 bytes.
Jul  5 14:10:34 servername spamd[11527]: result: . -2 - ALL_TRUSTED 
scantime=3.1,size=829, mid=<200507051410.28606.
username@example.com>,autolearn=ham

Also check the headers of the delivered mail. There should be new headers that look like this

X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
servername.example.com
X-Spam-Level: 
X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED 
autolearn=ham version=3.0.4
X-UID: 64046
X-Length: 1048

Running SpamAssassin on a Remote Machine

If you are running SpamAssassin (spamd) on another machine you need to add the -i (or --listen-ip=ipaddr) flag to the options in /etc/defaults/spamassassin file this makes SpamAssasin listen on the given IP address. You also need to tell spamd which IP addresses are allowed to contact it using the -A (or --allowed-ips=..,..) flag e.g.

OPTIONS="--listen-ip www.xxx.yyy.zzz --allowed-ips www.xxx.yyy.zzz 
--create-prefs --max-children 5 --helper-home-dir"

Making SpamAssassin More Useful

So at this point the system works but it's not quite to my liking. I have chosen to turn Autolearn off as although it is pretty accurate even when it is "young" it can, and did the first time I used it, go wrong. It's worth installing pyzor as it's free and offers another check although with hash busters appearing on most spam the use is questionable but it seems to cope pretty well. The default settings for pyzor are pretty good so there isn't really any point in changing them. You can also install a similar system called DCC (dcc-client) which is again used by default if it is present.

By default the bayes token files get stored in /var/mail with names such as /spamassassin_toks. I prefere to have all the bayes files in their own directory called /var/mail/.spamassassin and the prefix bayes_. This is set up with the setting bayes_path /var/mail/.spamassassin/bayes. To activate it simply create the .spamassassin directory.

After reading the huge man page (Mail::SpamAssassin::Conf) I set up my SA configuration file as shown below. This is a fairly good set up for training a system up and so far it hasn't mis-classified a single mail. You will need to install pyzor and razor to use this configuration. You will also need to enable the use of TextCat in the /etc/spamassassin/v310.pre configuration file.

# This is the right place to customize your installation of SpamAssassin.
#
# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
# tweaked.
#
###########################################################################
#Replace the * that is normally used in the X-Spam-Level with another
#character that doesn't have to be escaped in a mailproc filter.
#In this case I am using the character x. Although this is in the
#manual it is a bit cryptic.
add_header all Level _STARS(x)_
#The score required before an email is marked as spam. The
#default is 5.0 which is quite aggressive. A value of around 8.0
#is generally better
required_score 5.0
#Set SA to rewite (or add) the subject header so that it is
#clearly marked as spam
rewrite_header Subject *** SPAM ***
#This tells SA what to do with the body of messages that it considers
#to be spam. 0 leaves them alone, 1 adds them as an attachment and
#2 mangles the hell out of them. If  everyone on the mail server
#uses a half-way decent mail client this can be set to 0
report_safe 0
##################
# Bayes Setup
##################
#Tell SpamAssassin where to store the bayes token files.
bayes_path /var/mail/.spamassassin/bayes
bayes_file_mode 0777
#Tweek the scores for Bayes once the system is trained up.
#score BAYES_95  6.00
#score BAYES_99  8.00
#Auto learning. The first option switches it on and off the
#second sets the level for learning from non-spam the third
#the level for learing from spam. The spam learning level default
#is 12.0 but it is worth setting this a little lower on a new system
#to help it learn.
bayes_auto_learn 0
bayes_auto_learn_threshold_nonspam 0.0
bayes_auto_learn_threshold_spam 10.0
# Enable or disable network checks
skip_rbl_checks         0
use_razor2              0
use_pyzor               1
# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
# - english
ok_languages            en
# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_locales              en
#Stop auto expiry of bayes. WARNING: You must have a cron job that expires
#old bayes tokens using "sa-learn --force-expire" if you turn off this
#option. Turn this option off if opptunistic expiry is taking too long.
bayes_auto_expire 1

Using SpamAssassing with Maildrop

To use the spam headings with maildrop is quite easy. The settings shown above will mark any messages that score more than 5.0 as spam. The rule below will then count the x's in the X-Spam-Level header of the mail and filter it to the spam folder. You could use the X-Spam-Flag header as well but that doesn't appear in messages if the message isn't spam and doesn't allow for such fine tuning. With this system anything that scores highly will have a *** SPAM *** subject rewrite and get moved to another folder. If we wanted we could also filter lower scoring mails, with maybe four points, to and assumed-spam folder. Note that the order of the rules in the maildrop configuration file is important and this rule should come first if you want to aggressively remove spam (it will remove spam from mailing lists as well) and last if you want to make sure it doesn't accidentally remove legitimate mail that would otherwise be filtered. I leave it at the top as SpamAssassin almost never marks a mail as spam when it isn't spam.

if (/^X-Spam-Level:.*xxxxxxxx.*/:h )
{
exception {
to $DEFAULT/.spam/
}
}

Killing Spam with Black Listing

Courier has the ability to kill spam on it's own using black lists. For a long time I used only this approach to spam handling as it caught the vast majority of spam. In the last year though the effectiveness of this approach seems to have dropped quite a lot which is why I have installed SpamAssassin. For those of you who want to turn on the black listing in Courier have a look in the file /etc/courier/esmtpd file for the setting BLACKLISTS. There is a comprehensive comment in there about how to use black listing. My setting was

BLACKLISTS="-block=sbl-xbl.spamhaus.org,BLOCK -block=list.dsbl.org,BLOCK"

but since installing SpamAssassin I have switched it off as it helps SpamAssassin learn.

Training SpamAssassin Manually

If you want to get the best possible results from SpamAssassin you will have to do at least some manual training. The auto-train feature works pretty well but without some additional input from you it will always be misclassifying mail. It is important to try and teach SpamAssassin with both ham (non-spam) and spam as teaching with just spam can lead to a badly skewed view of the world.

To teach SpamAssassin you use the sa-learn command with either a --spam or --ham flag depending on what you are trying to teach it. You then pass it anything from a single email to a group of folders. The simplest way to teach it is to set up a junk folder (as described above) and have maildrop put most of the stuff that looks like spam in it. Initially set the filter into junk folder level quite high so that you have to manually move quite a bit of spam into it - this helps stop false positives ending up in the junk folder early on in the training. Once in a while you then run the command

sa-learn --spam ~/Maildir/.spam/cur/

to train SpamAssassin with those emails (they can be deleted once the training is over). As I said it is important that you also train it on ham mail as well. There is a little gotcha here though - you can't train it on the 10000 old emails you have hanging around or it will think that legitimate emails have sent dates far in the past. One option is to turn off your mail rules for a little while to build up a corpus of ham mail in your inbox and then train with that. Alternatively you could just move mis-classified mail to a separate folder and train on only that. This last approach provides good results but it takes longer to train SpamAssassin. If you do nothing else it is important that you train SpamAssassin on false positives and negatives.

Finally, don't worry about accidentally training with one particular message more than once. SpamAssassin will only learn from a message once in one direction and you can fix learning mistakes.

If you are looking for a more automated approach to training SpamAssassin I suggest you try the link. This approach allows any number of clients train the system in a simple and quick manner.

Bayes Not Expiring

In a rare five spare minutes I had a quick look at my server to see if everything was generally running as it should be (as opposed to responding to things failing). Imagine my surprise when I noticed my .spamassassin folder was 2.5GB! A quick look inside indicated a problem. There were literally hundreds of files with the name "bayes_toks.expireXXXXX" where XXXXX was a PID. A quick search led to this page explaining why bayes_tok files are sometimes not deleted. I ran "sa-learn -D --force-expire" and as I suspected it took a long time (about 8 minutes). Note that you have to do this as the user you want to run the expiry for.

The solution is to include "bayes_auto_expire 0" in local.cf and then set up a daily cron job that does something like this "sa-learn -u user --force-expire". I'm sure there are nicer ways to do this but this seems to work. If you have any bayes_toks.expireXXXXX files sitting around it is safe to delete them as long as nothing is accessing them at the time (typically if spamassassin is stopped).

Alternatives

Alternatively you could use MailScanner rather than SpamAssassin. Actually that's sort of hiding the truth. MailScanner still uses SpamAssassin but it can also integrate numerous virus checkers as well.

Resources

Adverts

Donate and Help

Please support this site and
Bandwidth doesn't grow on trees y' know :o)

Adverts

Get Adsense