How do I configure spam autolearn and properly use zmtrainsa so it trains for all users then deletes spam after?

BobOki · Post by **BobOki** » Wed Jul 08, 2015 11:45 am

Latest 8.6 with patch 3 installed. I have gone through the wiki.zimbra.com/wiki/SpamAssassin_Customizations already and installed pyzor and razor as it stated. I have done tons of searches and tried everything I have come up to so far to get my mail to autolearn=spam but it just keeps showing no. I can manually run the zmtrainsa on a single account, and it says it is doing something but I show zero activity in the logs and the spamtrain.log.
I have a /opt/zimbra/data/spammassasin/localrules: local.cf, salocal.cf, sakam.cf, and sauser.cf all of them have bayes_auto_learn 1 in them, otherwise quite the default settings. The sauser is basically the same one from the spamassasin_customizations doc listed above for razor/pyzor.
So what am I missing here for spam to learn (it catches some stuff but NO learning), verify razor/pyzor are working, and setup some form of spam learning nightly from the junk folders then delete mail from junk folder for all users? This is pretty much my last hurdle that is killing this deployment.

BobOki · Post by **BobOki** » Fri Jul 10, 2015 8:59 am

Would really appreciate someone spending the time to assist me with this, or to make a How-To, or even point me to documents/urls that explain what I need to do to make this work.

BobOki · Post by **BobOki** » Tue Jul 14, 2015 8:23 am

Someone help a brother out please! I have no idea why zero learning is going on even when I do zmtrainsa.. says it learns but not learning anything and no log entries at all happen in spamtrain.log.

BobOki · Post by **BobOki** » Thu Jul 16, 2015 7:26 am

Am I posting in the wrong room or something? This was a default install and these pieces do not work, so I cannot be the only one. I will cross post in a different forums I guess, once again I appreciate any help I can get on this!

iamauser · Post by **iamauser** » Thu Jul 16, 2015 11:21 pm

Give the admin guide a read-over again.

https://www.zimbra.com/docs/ne/8.6.0/ad ... tings.html

I think you're confused about how the spam training works.

If there are messages in user spam folders, they're already tagged as spam, hence there would be nothing to train -> nothing in your logs.

zmtrainsa does not delete messages from user spam folders, it deletes them from the spam/ham accounts. spam/ham accounts only get messages that users manually mark as spam or not. I think you need a minimum of like 200 uncaught spam/ham messages to get anything useful out of the training, so if you've got a lot of samples, you might want to train manually. But otherwise, it sounds like everything is working fine to me?

BobOki · Post by **BobOki** » Fri Jul 17, 2015 7:29 am

I might be very confused. No message at all, even the ones that go directly to the junk folder as spam are tagged as spam in the headers. When I do a manual train on that folder (which has over 1000 spam messages in it) it says it does something but I see nothing anywhere in the logs (the spam logs are 0bytes) and it does not seem like any training takes place as the same messages continue to make it through the filters.

"zimbra@linuxmail:~/bin$ zmtrainsa XXXX spam Junk

20150717082357 Starting spamassassin spam training for XXXXX using folder Junk

[] INFO: Total messages processed: 1371

Learned tokens from 222 message(s) (1340 message(s) examined)

20150717082455 Finished spamassassin spam training for XXXXX using folder Junk"

Now the spamtrain.log is still completely blank. If I flag a email as spam nothing shows up there either. Is there anywhere that this is logged so I can see that spam flagged as spam in the boxes IS in fact doing something, and also for the manual training? I really appreciate your feedback btw!

BobOki · Post by **BobOki** » Fri Jul 17, 2015 7:43 am

Additionally mail is tagged like this below:

X-Spam-Status: No, score=6.225 tagged_above=-10 required=6.6

tests=[BAYES_99=4, BAYES_999=0.2, FUZZY_AMBIEN=0.552,

HTML_MESSAGE=0.001, KAM_OTHER_BAD_TLD=0.75, MIME_HTML_ONLY=0.723,

SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, T_RP_MATCHES_RCVD=-0.01]

autolearn=no autolearn_force=no

No auto learn is running via headers.

BobOki · Post by **BobOki** » Mon Jul 20, 2015 7:42 am

It also seems that anything I put in the preferences/Trusted Addresses domains does not even seem to register at all and gets tagged as spam regardless.

iamauser · Post by **iamauser** » Mon Jul 20, 2015 8:36 pm

Let me see if I can help you out here. I think the first thing that probably will make things clearer is the description of the autolearn statement in the header. Just because it says "no" does not mean that autolearn is not running. Take a look over at the Spamassassin site for a very informative explanation of this header info.

https://wiki.apache.org/spamassassin/Au ... NotWorking

So, if you look at this explanation of the autolearn header, I think it explains why you're not seeing as much in the learning as you expect - you're probably telling spamassassin to learn messages it already learned, so it's just ignoring them (nothing new to learn, as it were).

From your manual zmtrainsa, I think you can see it's actually only learning stuff from less than a fifth of the message volume you're feeding it. So, the autolearn is working, but maybe you're just unlucky with spambots flooding you with junk that isn't getting picked up as spam easily.

As to a more clear "why" spams are getting through, based on that sample header you popped up there, it looks like spam is not being scored enough to get tossed as spam. (score=6.225 ... required=6.6)

So, it might be beneficial to fuddle with the threshold for tossing mails as spam. I want to stress that your mileage will vary with this, so before you decrease kill and tag percentages, have a look what they are at now. Basically, the lower the numbers, the more aggressive a potential spam will be tagged.

zmprov gacf | grep -e zimbraSpamTagPercent -e zimbraSpamKillPercent

If you decide to adjust zimbraSpamTagPercent & zimbraSpamKillPercent, don't forget to run this command:

zmamavisdctl restart

With a lower threshold, wait for a couple of days, see what the spam traffic is like then.

I get the same thing as you for the spamtrain.log... Maybe for default settings it's just logging errors into there, so if everything is working fine, nothing gets pumped into there.

You'll have to pardon my ignorance here, I couldn't really find any useful info on this logfile.

(maybe spamassassin site has some better info.)

I also dunno why stuff in trusted addresses is getting tagged as spam... from what I understand trusted addresses should skip the spam check, so that's weird.

I hope this points you in a good direction.

p.s. for fun, I like to use this guy's site of archived spam every now and again to manually train: http://untroubled.org/spam/

Again, your mileage will vary. ;p

BobOki · Post by **BobOki** » Tue Jul 21, 2015 11:51 am

That was genuinely extremely helpful, and I greatly thank you for responding! I think I will try to train the spam filter with that huge spam archive you threw in there, then POSSIBLY lower my score to a 5.5. I don't think I get many false positives, save the one mail that even ignore my exceptions list.

I really like that you even included the restart, very thorough, helpful, and again appreciated!