Rspamd: Fast, free and open-source spam filtering system

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
phoenix
Ambassador
Ambassador
Posts: 27272
Joined: Fri Sep 12, 2014 9:56 pm
Location: Liverpool, England

Re: Rspamd: A replacement for Spamassassin

Post by phoenix »

iceruam wrote:I for one still use zimbra internally, but all of our mail come in forwarded from gmail first...zimbra spam filtering has been so terrible I needed to find an alternative
Why don't you try Rspamd, if you see no improvement (I'd be surprised) you could always revert the changes I describe in my first post (or the newly created wiki article: https://wiki.zimbra.com/wiki/Rspamd) - its fairly trivial install and remove. There's always the obvious proviso of baking backups of config files and trying it on a test server.
Regards

Bill

Rspamd: A high performance spamassassin replacement

Per ardua ad astra
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: Rspamd: A replacement for Spamassassin

Post by JDunphy »

iceruam wrote:I for one still use zimbra internally, but all of our mail come in forwarded from gmail first...zimbra spam filtering has been so terrible I needed to find an alternative
Are you saying that you have gmail do spam filtering and then forwarding your email to zimbra? If so, have you thought about adding some extra addresses to SA in trusted_networks with sauser.cf to include those gmail ip's... I assume the same would apply for rspamd as you need to tell the spam software to do the dns lookups on the next hop ip address.

It hasn't been a good year for SA. Until March, I basically did nothing but now my sauser.cf file is 534 lines with us overriding many of the scores and loads of custom rules and a few custom plugins. The big problem is there hasn't been a SA score update since May because their build process is broken and only a manual one in June. Worse, it appears from their mailing lists that more emphasis is in meta body rules and blacklists that do not work well with the type of spam we are seeing given the advances being made in delivery. As a result, we are maintaining many of the core SA libraries with changes to handle html context better. Modern spam delivery knows how to obfuscate many of the SA rules as html becomes more prevalent and standard practice is to test against a ruleset + scores (which no longer changes). Given how dangerous spam can be used in delivery of malware, more needs to be done and I would like to see Zimbra become more proactive here.

I have a ticket open on this if you search https://bugzilla.zimbra.com/buglist.cgi ... lution=--- I think Zimbra has given up myself and perhaps rspamd is the solution. For me we have SA working really really well but it's been full time reworking the html parser so haven't had much time to get too deep in rspamd. When I do, I have a ton of things to test given the advancement of spam and the obfuscation mentioned above. One reason SA is so slow is the number of dns lookups for both white and black lists. Run it in debug mode sometime against a mail message to see what I mean. I think by default you are probably looking at 10-12 lookups for all the black and white lists included with zimbra. That is a lot of latency as I had no idea how much SA uses whitelists to help in their scoring. I really hate whitelists myself and barely tolerate blacklists because we focus a ton on html structure of each email with some help from geolookups for the delivery country. A few low scoring blacklists are included in our spam scoring model but they are used more to investigate deeper with some extra meta rule checks.

I need to get a milter on our front end MX's to fork email to both rpsamd and SA on some zimbra instances and then compare. I have rspamd setup per Bill's instructions which was very easy to follow but here it is Oct and still haven't got it done. Arhhhh.
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: Rspamd: A replacement for Spamassassin & Postscreen

Post by JDunphy »

I have incoming emails going in parallel to both SA and rspamd so I can observe them. It is a small subset of a few email accounts going to the rspam zimbra instance. This gives me a way to observe the same message and how zimbra with SA and zimbra rspamd interact. A few comments less than 24 hours into this.

* Less CPU for rspamd but more context switches (approx 20% perhaps more)
* A lot more of ESTABLISHED local port 53 connections... Saw over 900 since I installed in Aug 1 just leaving the software running with no incoming email but I updated the software and it's stable for the last 24 hours. Just under 100.
* a Zimbra rspamd has approx 2x the number of ESTABLISHED system connections vs the default Zimbra SA
* overall combined memory foot print of zimbra+rspamd instance is slightly larger but not willing to commit to this yet :-)
* Note: I haven't tuned this at all and just used Bill's howto from this thread
* The gui is fairly impressive as is the ability to customize the software
* Feels like the default spam rules are very fast (ridiculously fast) but not very sophisticated in comparison to our SA environment

As far as spam detection, it is not better yet nor close but that comparison is unfair given I have no idea what is going on and I have extensively tuned SA for our spam mix. I also haven't been able to figure out how to make it view the next hop which is critically important given how I am using it for this test system. In general, it would be better for my test system run this on my front-end MX's instead of on the zimbra instances as I think it really wants to be on the front end MTA to do the rejects and bounce it back to the spammer, etc. I was very happy to see the ASN module as I have something similar in my front-end MX's that add headers for SA to use downstream. Also, if your users are sensitive to missed email because of DMARC reject policy (caused by inexperienced senders or mailing lists, etc) then you need to change this behavior as the sender will get bounces back that their email is junk. This problem exists with SA also in a slightly different way so I mention it as reminder for those that have changed this behavior with SA by lowering that rules score.

It feels like this could be a replacement for SA but it's evolving fast (really fast!!!) and some of the ways that dynamic maps are implement has me wondering about security attacks given that we have seen some forged DNS resolver's queries in some data centers. I am probably wrong on this but there is a lot of ways to modify the software on the fly and hackers have a way of finding ways into these input streams causing unexpected behaviors. I like it but not even 24 hours of actual use yet so more questions than answers at this point. I reserve the right to remove this statement in the future if/when I am wrong... hahahahaha :-)

BTW, I would recommend anyone wanting to evaluate this to update local.d/metrics.conf and change the reject score higher so you can see the messages and observe the rspam spam headers. Because this configuration we are testing in this thread appears to want to be on the same machine as zimbra; if you have front-end's passing email be on the lookout for the bounce backs in your outgoing mail queues. Again, a higher REJECT score for actions can prevent this while you gain operational experience with the software. I am so new at this, I don't really understand what is going on but it feels fast, flexible, and is a welcome new spam detection engine IMHO. Here is what rspamd says for SA users migrating. https://www.rspamd.com/doc/tutorials/migrate_sa.html ... There has been a lot of focus on performance so detection should eventually catch up as more systems deploying this and send in their improvements.

If you are looking for a no configuration option with SA or rspamd... I don't know. There are some sophisticated techniques out there and a few blacklists with overly simple body checks don't work well in all envioronments. That the professional senders (ie. mailchimp, etc) are in most of our whitelists in conjunction with providing their customers test tools to verify their payloads against spam detection software is a significant part of the problem we are facing. The best hedge is to customize your rules for your spam so that your statistical methods like bayes and these rules offer more help for this type of problem. It would be interesting if rspamd and its experimental neural network module https://www.rspamd.com/doc/modules/fann.html could allow us to decentralize some of analysis but frankly, I have no idea what the possibilities are with it at this point in time.

Jim
vstakhov
Posts: 7
Joined: Sat Sep 09, 2017 12:40 pm

Re: Rspamd: A replacement for Spamassassin & Postscreen

Post by vstakhov »

1. More context switches means that Rspamd spends more time in waiting for results than on processing them. It is not an issue for all modern 64 bit platforms, e.g. Linux.
2. Rspamd uses pool of sockets to query DNS for better performance. It does not use TCP sockets at all for DNS so it is quite strange that you observe ESTABLISHED connections there...
3. If you have highly customised SA setup you can just load its rules using https://rspamd.com/doc/modules/spamassassin.html module. Some manual adjustment might be needed in this case. However, Rspamd is clever enough to optimise regexps using Hyperscan. By default, all ubuntu/debian packages on rspamd.com are built with hyperscan support. Centos/Fedora package are unfortunately not optimised due to stupidities in their build toolchains. That's possible to fix - I just have not enough time for that.

Dynamic maps could be served via HTTPS (that's how the default maps are served) - Rspamd does full certificates check on maps load. It is also possible to sign maps using `ed25519` signature algorithm and check them via trusted public key. What security attacks are you talking about in this case?

I plan to release 1.6.5 likely today or tomorrow with various bugfixes especially on the tokenization side. It should improve the overall accuracy for both statistical methods and lists checks. The next 1.7 release is planned to come with more sophisticated machine learning techniques (via torch.ch), better reputation plugins (e.g. DKIM reputation) and other major improvements. I would also appreciate your testing experience to improve Rspamd in future. I know that the current documentation is sometimes outdated or not complete - we are working hard to improve it eventually.
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: Rspamd: A replacement for Spamassassin & Postscreen

Post by JDunphy »

I agree with all that. If I said tcp port 53, I was mistaken. I observed the following:

Code: Select all

udp        0      0 127.0.0.1:53751             127.0.0.1:53                ESTABLISHED 
udp        0      0 127.0.0.1:49712             127.0.0.1:53                ESTABLISHED 
udp        0      0 127.0.0.1:33421             127.0.0.1:53                ESTABLISHED 
...
udp        0      0 127.0.0.1:33005             127.0.0.1:53                ESTABLISHED 
udp        0      0 127.0.0.1:33010             127.0.0.1:53                ESTABLISHED 
udp        0      0 127.0.0.1:53534             127.0.0.1:53                ESTABLISHED 
Which when I ran an lsof got this on a centos 6 host:

Code: Select all

# lsof i:53
rspamd  29165 _rspamd    8u  IPv4 233773006      0t0  UDP localhost:44834->localhost:domain 
rspamd  29165 _rspamd   11u  IPv4 233773007      0t0  UDP localhost:47328->localhost:domain 
rspamd  29165 _rspamd   13u  IPv4 233773008      0t0  UDP localhost:47024->localhost:domain 
...
rspamd  29165 _rspamd   25u  IPv4 233773019      0t0  UDP localhost:47125->localhost:domain 
rspamd  29165 _rspamd   26u  IPv4 233773020      0t0  UDP localhost:60120->localhost:domain 
I will work on this again today to increase my understanding of rspamd. Our SA environment will not work with the rspamd SA lua module which appears limited in capability when I looked at that module yesterday. In fact, I thought I read that it didn't handle HTML.pm yet. I could be mistaken as I don't have a lot of time with rspamd yet. One reason we were excited to learn about the rspamd was we have done a lot to that old SA HTML.pm, HTMLEval.pm, MIMEEval.pm, etc plugins and are at a point where we wonder if it can be enhanced to properly handle modern html semantics. There has been a lot of improvements and additions since HTML.pm was first written. In addition, we have quite a few of our own SA plugins that the rspamd SA module wouldn't understand. My thought process was to use the existing rspamd framework vs trying to make the SA module handle it if we went that way.
vstakhov
Posts: 7
Joined: Sat Sep 09, 2017 12:40 pm

Re: Rspamd: A replacement for Spamassassin & Postscreen

Post by vstakhov »

DNS sockets pool is fine. It is used as a tradeoff between security and source port randomization and performance (do not open a socket on each request). This pool is slowly rotated over time. Here are the default settings:

Code: Select all

% rspamadm configdump options.dns
*** Section options.dns ***
timeout = 1.0;
sockets = 16;
retransmits = 5;
servers = "127.0.0.1";

*** End of section options.dns ***
So it is 16 sockets per DNS server. You can modify it in `local.d/options.inc` if you want:

Code: Select all

dns {
  sockets = 4;
}
WRT SA rules: Rspamd is mostly designed to improve SA regexp rules corpus. HTML/MIME eval rules were too bad to port them back to Rspamd...
phoenix
Ambassador
Ambassador
Posts: 27272
Joined: Fri Sep 12, 2014 9:56 pm
Location: Liverpool, England

Re: Rspamd: A replacement for Spamassassin & Postscreen

Post by phoenix »

Just a quick note for anyone that's been using my zmtrainsa script (in my original post). I've finally modified it and removed extraneous bits of it relating to SA and also given it the same functionality as the original in being able to train from a user account Junk or Inbox for spam/ham. Any comments or improvements on the script are welcome, just add a comment here. :)
Regards

Bill

Rspamd: A high performance spamassassin replacement

Per ardua ad astra
MisterM75
Advanced member
Advanced member
Posts: 77
Joined: Sat Aug 05, 2017 7:10 am

Re: Rspamd: A replacement for Spamassassin & Postscreen

Post by MisterM75 »

Hello

Do we have to let amavis work or can we disable it completely?
Because it is duplicative with Rspamd ...

So double memory ... so slow on the server ...

Yours truly
Mz
phoenix
Ambassador
Ambassador
Posts: 27272
Joined: Fri Sep 12, 2014 9:56 pm
Location: Liverpool, England

Re: Rspamd: A replacement for Spamassassin & Postscreen

Post by phoenix »

It's used it the ant-virus system (clam a/v) on ZCS and I've left it like that for simplicity for people to install. There is no ant-virus scanning in rspamd so you would have to have another clam a/v instance (or a paid-for product) on a server somewhere on your LAN. I don't see it causing any great overhead on my system.
Regards

Bill

Rspamd: A high performance spamassassin replacement

Per ardua ad astra
MisterM75
Advanced member
Advanced member
Posts: 77
Joined: Sat Aug 05, 2017 7:10 am

Re: Rspamd: A replacement for Spamassassin & Postscreen

Post by MisterM75 »

Hello

Hum ... I think you forgot to go further with the documentation of Rspamd ...

https://rspamd.com/doc/modules/antivirus.html

Mz :) ;)
Post Reply