Migrating spamassassin to version 4.0

June 24, 2023 by Roberto Puzzanghera 4 comments

Install spamassassin v. 4

SA v.4 DMARC plugin requires Mail::DMARC::PurePerl, while DecodeShortURLs requires DBD::SQLite (or DBD::MariaDB or DBD::mysql), so it's better to install them before the upgrade:

perl -MCPAN -e shell
cpan> force notest install Mail::DMARC::PurePerl DBD::SQLite
cpan> quit

Stop qmail and spamd and then upgrade spamassassin, run sa-update and restart the services: 

qmailctl stop
spamdctl stop

perl -MCPAN -e shell
cpan> force notest install Mail::SpamAssassin Mail::SpamAssassin::Plugin::Razor2
cpan> quit

spamdctl start
qmailctl start

Load the new plugins

Load all plugins commented out in /etc/mail/spamassassin/v400.pre. Then add your config to your local.cf.


The purpose of the ExtractText plugin is to, when enabled, convert attachments (including images, byt the use of an OCR) into plain text in order to SpamAssassin to apply its rules to this text. So if we receive doc/pdf/images with spammy text in them, SpamAssassin will now be able to safely mark the email as spam.

In order to do that, we need to have installed some external programs in our server. The configuration lines added to local.cf have to load these programs to scan each message attachment.

Install the required external programs. Debian users will do;

apt-get install antiword
apt-get install docx2txt
apt-get install unrtf
apt-get install odt2txt
apt-get install tesseract-ocr
apt-get install poppler-utils

Add the following lines to the local.cf file:

ifplugin Mail::SpamAssassin::Plugin::ExtractText

extracttext_external pdftotext /usr/bin/pdftotext -nopgbrk -layout -enc UTF-8 {} -
extracttext_use pdftotext .pdf application/pdf

# http://docx2txt.sourceforge.net
extracttext_external docx2txt /usr/bin/docx2txt {} -
extracttext_use docx2txt .docx application/docx

extracttext_external antiword /usr/bin/antiword -t -w 0 -m UTF-8.txt {}
extracttext_use antiword .doc application/(?:vnd\.?)?ms-?word.*

extracttext_external unrtf /usr/bin/unrtf --nopict {}
extracttext_use unrtf .doc .rtf application/rtf text/rtf

extracttext_external odt2txt /usr/bin/odt2txt --encoding=UTF-8 {}
extracttext_use odt2txt .odt .ott application/.*?opendocument.*text
extracttext_use odt2txt .sdw .stw application/(?:x-)?soffice application/(?:x-)?starwriter

extracttext_external tesseract {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -c page_separator= {} -
extracttext_use tesseract .jpg .png .bmp .tif .tiff image/(?:jpeg|png|x-ms-bmp|tiff)

add_header all ExtractText-Flags _EXTRACTTEXTFLAGS_

#header PDF_NO_TEXT X-ExtractText-Flags =~ /\bNoText\b/
#describe PDF_NO_TEXT PDF without text
#score PDF_NO_TEXT 0.2

#header DOC_NO_TEXT X-ExtractText-Flags =~ /\bNoText\b/
#describe DOC_NO_TEXT Document without text
#score DOC_NO_TEXT 0.2

#header EXTRACTTEXT exists:X-ExtractText-Flags
#describe EXTRACTTEXT Email processed by extracttext plugin
#score EXTRACTTEXT 0.001


You can see three rules commented out. You can safely leave them commented out or enable them for debug purposes. The EXTRACTTEXT rule is just to have proof that the plugin is active. PDF_NO_TEXT and DOC_NO_TEXT will be hit in case of an empty document in attach. You will have an header like this when these two rules have been hit:

X-Spam-ExtractText-Flags: NoText

Update the RC's sauserprefs plugin

Update your johndoh/roundcube-sauserprefs plugin by downloading it from github, as the v. 1.20.1 version is not yet available as a composer installable plugin.

cd /var/www/roundcube/htdocs/plugins
wget https://github.com/johndoh/roundcube-sauserprefs/archive/refs/tags/1.20.1.tar.gz
tar xzf 1.20.1.tar.gz
mv roundcube-sauserprefs-1.20.1/ sauserprefs
cd sauserprefs
mv config.inc.php.dist config.inc.php

Setup MySQL login. Then turn this flag to true to gain spamassassin v.4 compatibility:

$config['sauserprefs_sav4'] = true;

All rules, functions, command line options and modules that contain "whitelist" or "blacklist" have been renamed to contain more racially neutral "welcomelist" and "blocklist" terms. So we have to update our sauserprefs DB records. Use this php scriplet from command line to do the job, after adjusting your mysql login:

cat > sauserprefs_blocklist_welcomelist.php << __EOF__
Finds and replaces deprecated strings blacklist_from/whitelist_from in spamassassin.userpref
db table to blocklist_from/welcomelist_from respectively.

\$host = "localhost";
\$database = "spamassassin";
\$user = "spamassassin";
\$password = "xxxxxxxxxxxxxxxxxx";

\$link = mysqli_connect(\$host, \$user, \$password, \$database) or die("Unable to connect to MySQL");
\$query ="UPDATE userpref SET preference = REPLACE(REPLACE(preference, 'whitelist_from', 'welcomelist_from'), 'blacklist_from', 'blocklist_from')";
mysqli_query(\$link, \$query) or die(mysqli_error(\$link));
print "job done\n";

chmod +x sauserprefs_blocklist_welcomelist.php


Anyone try this on RHEL or related? (Almalinux, etc.)


The subject is pretty much the question.

The EPEL repos are still at 3.4, so I'm wondering if I should go ahead with these instructions or wait for the repos to update. 4.0 has been out for a year already...

Separately, if I pull the trigger, is there a way to revert back?



Reply |

Anyone try this on RHEL or related? (Almalinux, etc.)

I've never done the revert back. In case you want to revert I suggest to use the manual installation to overwrite the current installation, as the path of the RHEL package may be different.

Of course a test server would be best in such cases

Reply |

Two small typos in the SpamAssassin 4.0.0 migration tutorial

mv roundcube-sauserprefs-1.20.1/ sauserprefs
mv config.inc.php.dist config.inc.php

should read:

mv roundcube-sauserprefs-1.20.1/ sauserprefs
cd sauserprefs mv config.inc.php.dist config.inc.php


score        PDF_NO_TEXT  0.001zzz

Should read:

score        PDF_NO_TEXT  0.001


Reply |

Two small typos in the SpamAssassin 4.0.0 migration tutorial

Thank you. Corrected

Reply |

Recent comments
See also...
Recent posts

RSS feeds