Spamassassin - automating sa-learn with IMAP folders

Among the useful things we have found for our clients is a methodology for building a learning spam filter using Spamassassin and a mail server that supports IMAP folders such as dovecot. Simply adding Spamassassin with a standard configuration on incoming mail on a mail server can dramatically decrease the amount of spam users receive, but it will not catch nearly all spam sent to the server.

The reason for the lack of complete filtering is clear. Spammers play a cat-and-mouse game with spam filters, always attempting to modify messages in such a way as to avoid filtering. As filters change, spammers experiment until they find ways through, and they change their tactics as each new technique is detected. 

Because of this uncertainty with spam, Emergent Path recommends that clients who maintain their own mail servers implement the Bayesian filtering engine in Spamassassin and automate the learning process through the sa-learn script.

sa-learn is a command-line program that can be called and passed various arguments to classify messages as either ham (real messages) or spam (fake messages). Because it is a command-line program, it can be easily automated using cron on Unix/Linux systems. We recommend running a daily process on the mail server (depending on volume of mail and number of mail servers involved) that scans user-classified spam and ham using sa-learn to train Spamassassin.

A sample script might look something like the script below. This is a simple example and not necessarily a final production script:

 

!#/bin/bash

sa-learn --showdots --no-sync --spam /var/mail/domains/*/*/Maildir/.MakeSpam/cur/
sa-learn --showdots --no-sync --ham /var/mail/domains/*/*/Maildir/.MakeHam/cur/

rm /var/mail/domains/*/*/Maildir/.MakeSpam/cur/*
rm /var/mail/domains/*/*/Maildir/.MakeHam/cur/*

In this example script, each user who wants to tag spam creates an IMAP folder in the root of their account called MakeSpam. (The example assume a typical mail directory structure of /var/mail/domains/<domain_name>/<account>/Maildir/ for the root location of each user's mail folders.) For any spam messages that got through filtering to the inbox, the user drags those messages to the MakeSpam folder and leaves them. When the script aboe runs (via cron on the server), the messages will be classified as spam and then deleted. Over time this system will help Spamassassin improve its hit rate on spam messages.

Manually marking messages as ham that have been previously classified as spam may vary slightly depending on your SpamAssassin configuration. If Spamassassin is set up move spam to a Junk or Spam folder and simply add a header to the message, the user can simply move the message to the MakeHam folder, and when the script runs it will identify those messages as ham (good) and remember those settings for the future. If Spamassassin is set to create a new message and forward the original message as an attachment, the user may need to extract the original message from the attachment and place it in the MakeHam folder. 

Automated systems like this one can take time to develop and are sometimes tedious and error-prone to get right and keep right. We always recommend starting small with minimal functionality, proving that functionality over time, and adding to the functionality at a later date.

Comments
18 film izle's Gravatar well to it sweet, reaction The hollow acorns often and we I assumed I still the wild knew I never just their attempt.
# Posted By 18 film izle | 5/24/09 7:15 AM
iekry's Gravatar Thanks for instruction. I installed free version of windows 7 http://www.picktorrent.com/torrents/99/windows-7/ for few minutes. And to my mind this version of windows is success!
# Posted By iekry | 8/11/09 7:37 AM
uggs sale's Gravatar The grass is greener on <a href="http://www.uggssale.org/">uggs online sale</a> the other side of the fence.God helps those who help themselves.A little knowledge is dangerous.A good <a href="http://www.uggssale.org/">uggs outlet</a> medicine tastes bitter.
# Posted By uggs sale | 2/2/10 11:53 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.8.001.