SpamAssassin and NetQMail

Web Add comments

Spam in my inbox has slowly increased to an inconvenient amount. I’m using SpamAssassin and some spam emails even had a negative spam score, meaning they get a bonus to get through even when other spam criterions were met.

Most of these emails had the BAYES_00 rule, so I did some research. The bayesian filter in SpamAssassin (that’s the one that dynamically learns what kind emails you consider spam) has three outputs in SpamAssassin: BAYES_00 (meaning the bayesian filter thinks the email is good), BAYES_50 (meaning the bayesian filter thinks there’s a 50% probability of the email being spam) and BAYES_99 (of course meaning the email is spam with a 99% probability).

It looked like my bayesian filter had somehow learned to see spam emails as good emails. This surprised me, because I’ve set up two special folders in my IMAP inbox named ‘ham’ and ’spam’ into which I move any emails wrongly classified by SpamAssassin. An hourly cron task then picks up any emails moved into those folders and train my bayesian filter. So I assumed the darn thing would be well trained for the kind of spam I’m receiving.

What actually happened was that the cron task ran under the user account cron was using while SpamAssassin scanned my emails under another user account I had set up specifically for simscan, the utility I use to scan my emails at SMTP upload time (normal email servers accept all emails and then scan for spam; mine rejects spam right when it is being uploaded - this allows my email account to appear as being permanently unavailable or disabled for any spammer).

So, to make a long story short, all my efforts to train SpamAssassin’s bayesian filter for nearly two years have been for naught because I had trained cron’s bayesian database instead of the one used by simscan to scan incoming emails. Unattended, the bayesian filter slowly auto-trained itself to regard spam emails as good emails.

I cleared my bayesian filter’s database and I’m now hoping that this time, I will be able to properly train it. Here’s the shell script I’m now using:

#!/bin/sh

# ------------------------------------------------------------------
# Train SpamAssassin
#
for homedirectory in /home/*
do
  if [ -e $homedirectory/.maildir ]; then
    chmod +t $homedirectory/.maildir

    if [ -e $homedirectory/.maildir/.SpamAssassin.Spam ]; then
      if ls $homedirectory/.maildir/.SpamAssassin.Spam/cur/* >/dev/null 2>&1; then
        sa-learn \
          --username=simscan \
          --dbpath /var/spool/simscan/.spamassassin \
          --spam $homedirectory/.maildir/.SpamAssassin.Spam/cur/*
        mv $homedirectory/.maildir/.SpamAssassin.Spam/cur/* \
           $homedirectory/.maildir/.Trash/cur/
      fi
    fi

    if [ -e $homedirectory/.maildir/.SpamAssassin.Ham ]; then
      if ls $homedirectory/.maildir/.SpamAssassin.Ham/cur/* >/dev/null 2>&1; then
        sa-learn \
          --username=simscan \
          --dbpath /var/spool/simscan/.spamassassin \
          --ham $homedirectory/.maildir/.SpamAssassin.Ham/cur/*
        mv $homedirectory/.maildir/.SpamAssassin.Ham/cur/* \
           $homedirectory/.maildir/.Trash/cur/
      fi
    fi

    chmod -t $homedirectory/.maildir
  fi
done

4 Responses to “SpamAssassin and NetQMail”

  1. Nikol Says:

    Hello,

    I was wonder how can I integrate this into my system (netqmail,vpopmail,SpamAssassin,clamav,qmailscanner). Script is ok but how will qmail-scanner will now when to start this script an how?

  2. cygon Says:

    The script has nothing to do with qmail-scanner or anything else related to qmail. I also don’t think qmail-scanner is intended for filtering (”scanning”) emails, afaik it’s just a minor part of the qmail system you shouldn’t care about.

    My Script is intended to be run regularly by cron (eg. once a day, or once per hour) and will feed SpamAssassin’s bayesian filter with the emails users put in their SpamAssassin/Ham and SpamAssassin/Spam folder via IMAP.

    If you’re interested in SMTP-time rejection of spam, google for “simscan”, that’s what I’ve integrated into netqmail to invoke SpamAssassin before qmail closes the SMTP connection (eg. the original Spammer, no matter whether he’s faked the return address, will get the rejection).

  3. Kami Says:

    You can put spamdyke in front of all and you eliminate 98% of spamm your benefit’s will be rdns check blacklist faster rbl check and all at smtp level
    The best don’t need to modify your installation only one line plus in qmail smtp.run file

  4. Kami Says:

    I forget qmail-scanner is an interface to check qmail messages with spamassassin clamav have some rules named perl scanner what can restrict some word’s some addresse’s
    Clamav used with sanesecurity’s database’s is more efficient than spamassassin untrainned.

Leave a Reply

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Login