Teach spamAssasin – amavis that a message is SPAM

Spamassassin is a powerful antispam tool. However, it consumes a lot of processing power, so a good idea is to install amavis. This is a lightweight Perl script that pre-scans emails and rejects many of them based on rules that you set up within the Amaviz configuration file.

NOTEThis page won’t attempt to teach you how to install and configure Spamassassin or Amavis. Other tutorials exist online. This tutorial is here to give you tips that you may not find elsewhere.

Spamassassin uses bayesian filters (think of this as a form of artificial intelligence) that can learn about what sort of emails are spam (bad) and what sort are ham (good). The key to this is a tool calledsa-learn which you run against mailbox files that either contain only ham or only spam emails. This allows Spamassassin to learn which emails you think are spam. Spamassassin uses several files to store this information, kept in a hidden directory (.spamassassin) for each mail user.

To teach Spamassassin about spam, you pass the –-spam paramter to sa-learn. For ham, the parameter is –-ham.

In the examples below we will assume that Spamassassin is running under the user account spamd and that a mailbox file (in the mbox format common with IMAP servers) that contains only sample spam emails is called Junk and is in the /tmp directory.

Tip 1: Spamassassin with amavis uses the .spamassassin directory in the Amavis working directory (usually /var/spool/amavis). Therefore when you are teaching Spamassassin called by Amavis, you need to use the –dbpath parameter. E.g.:

sa-learn –dbpath /var/spool/amavis/.spamassassin –mbox –spam -u spamd /tmp/Junk

sa-learn will look at the emails and will teach Spamassassin that the emails are spam. However, Spamassassin needs to be told to reload its bayesian knowledge files in order to gain this new-found knowledge.

Tip 2: After running sa-learn, issue a kill -HUP to the spamd parent process to force a reload of the bayesian knowledge base. E.g.:

kill -HUP `cat /var/run/spamd.pid`

In very active system the spam flies in quickly filling the Junk file. This can slow down the sa-learn processing dramatically so a good idea is to clear it down. A common way in Linux to truncate a file is to issue a command such as:

> /tmp/Junk

However, for some IMAP servers, this can produce some nasty lockups in client email software when the mail user tries to add spam emails to the folder.

Tip 3: Clear down the Junk file(s) in an IMAP-friendly way. This means moving the file somewhere else for processing and recreating the user file rather than truncating it (note that we mv and recreate first before running sa-learn to ensure that the IMAP “folder” has only disappeared for a fraction of a second rather than waiting for a potentially very long sa-learn run to finish before recreating the file):

mv /home/username/mail/Junk /tmp/Junk

touch /home/username/mail/Junk

chown brad /home/username/mail/Junk

chmod 700 /home/username/mail/Junk

sa-learn –dbpath /var/spool/amavis/.spamassassin –mbox –spam -u spamd /tmp/Junk

Spammers use automated tools to harvest email addresses. Publishing an email address online is a magnet for spam. This can be to your advantage if you want Spamassassin to learn about new spam messages before they arrive at your legitimate email addresses. The trick is to make spammers send spam to honeypot email addresses first:

Tip 4: Create honeypot email addresses that route all email received at those addresses into a spam email file. This can then be used to teach Spamassassin about new forms of spam before the spammers send to your legitimate email addresses. Seed the spam email addresses on the Internet. Put them into web pages where email address harvesting software will find them but ensure that humans will not send legitimate email to them by putting up suitable messages around the email addresses.

Of course, you want Spamassassin to learn about spam automatically. This means that you will want sa-learn to run periodically.

Tip 5: Create a cron job to run sa-learn periodically, letting it learn what is spam from the honeypot email addresses as well as the Junk folders maintained by your email users. To do this, you need a suitable cron script. Below is a template for you to use. You will need to adjust the paths to the executables and files applicable on your system. In the example below, we have called the file where the emails from the honeypots are stored honeypot which we store in /var/spool/mail.

We have assumed that users move spam that they receive into (an IMAP) file on the server called Junk. In the example we show two techniques for processing this Junk user file. For username we truncate the file in an IMAP friendly manner by moving it and recreating the user file before sa-learn processes the moved file. For usernameX we don’t truncate the file. This means that the file will continue to grow in size until it’s truncated by some other means. Sa-learn will ignore spam emails that it has already learned about so it is safe to not truncate a file provided that it doesn’t grow to a point that sa-learn takes a long time to process it. If in doubt, truncate.

Also in the example below, we show how sa-learn can simply take a list of filenames on the command line which is handy if you have more than one file building up a store of spam emails:

#!/bin/bash

 

/bin/mv /home/username/mail/Junk /tmp/Junk

/bin/touch /home/username/mail/Junk

/bin/chown brad /home/username/mail/Junk

/bin/chmod 700 /home/username/mail/Junk

 

/usr/bin/sa-learn –dbpath /var/spool/amavis/.spamassassin –mbox –spam -u spamd /tmp/Junk /home/usernameX/mail/Junk /var/spool/mail/honeypot >/tmp/sa-learn.log 2>&1

 

# Truncate the honeypot file

> /var/spool/mail/honeypot

 

rm -f /tmp/Junk

 

/bin/kill -HUP `/bin/cat /var/run/spamd.pid`