Spam in my inbox has slowly increased to an inconvenient amount. I’m using SpamAssassin and some spam emails even had a negative spam score, meaning they get a bonus to get through even when other spam criterions were met.
Most of these emails had the BAYES_00 rule, so I did some research. The bayesian filter in SpamAssassin (that’s the one that dynamically learns what kind emails you consider spam) has three outputs in SpamAssassin: BAYES_00 (meaning the bayesian filter thinks the email is good), BAYES_50 (meaning the bayesian filter thinks there’s a 50% probability of the email being spam) and BAYES_99 (of course meaning the email is spam with a 99% probability).
It looked like my bayesian filter had somehow learned to see spam emails as good emails. This surprised me, because I’ve set up two special folders in my IMAP inbox named ‘ham’ and ’spam’ into which I move any emails wrongly classified by SpamAssassin. An hourly cron task then picks up any emails moved into those folders and train my bayesian filter. So I assumed the darn thing would be well trained for the kind of spam I’m receiving.
What actually happened was that the cron task ran under the user account cron was using while SpamAssassin scanned my emails under another user account I had set up specifically for simscan, the utility I use to scan my emails at SMTP upload time (normal email servers accept all emails and then scan for spam; mine rejects spam right when it is being uploaded - this allows my email account to appear as being permanently unavailable or disabled for any spammer).
So, to make a long story short, all my efforts to train SpamAssassin’s bayesian filter for nearly two years have been for naught because I had trained cron’s bayesian database instead of the one used by simscan to scan incoming emails. Unattended, the bayesian filter slowly auto-trained itself to regard spam emails as good emails.
I cleared my bayesian filter’s database and I’m now hoping that this time, I will be able to properly train it. Here’s the shell script I’m now using:
#!/bin/sh
# ------------------------------------------------------------------
# Train SpamAssassin
#
for homedirectory in /home/*
do
if [ -e $homedirectory/.maildir ]; then
chmod +t $homedirectory/.maildir
if [ -e $homedirectory/.maildir/.SpamAssassin.Spam ]; then
if ls $homedirectory/.maildir/.SpamAssassin.Spam/cur/* >/dev/null 2>&1; then
sa-learn \
--username=simscan \
--dbpath /var/spool/simscan/.spamassassin \
--spam $homedirectory/.maildir/.SpamAssassin.Spam/cur/*
mv $homedirectory/.maildir/.SpamAssassin.Spam/cur/* \
$homedirectory/.maildir/.Trash/cur/
fi
fi
if [ -e $homedirectory/.maildir/.SpamAssassin.Ham ]; then
if ls $homedirectory/.maildir/.SpamAssassin.Ham/cur/* >/dev/null 2>&1; then
sa-learn \
--username=simscan \
--dbpath /var/spool/simscan/.spamassassin \
--ham $homedirectory/.maildir/.SpamAssassin.Ham/cur/*
mv $homedirectory/.maildir/.SpamAssassin.Ham/cur/* \
$homedirectory/.maildir/.Trash/cur/
fi
fi
chmod -t $homedirectory/.maildir
fi
done
Recent Comments