Spam check for images?

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
Post Reply
User avatar
pup_seba
Outstanding Member
Outstanding Member
Posts: 687
Joined: Sat Sep 13, 2014 2:43 am
Location: Tarragona - Spain
Contact:

Spam check for images?

Post by pup_seba »

Hi,

I see that some of the phishing attempts or others (like the ones "we hacked your account"), are being sent with no text in the body, but an image (.png) with the text, the bitcoin wallet info and a QR code (nice addition since there is no hiperlink in the image to clic on).

I've been googling without success for a way to check for images text. I found some Spamassassin plugins but I found this info in forums from 2013 or so. How do you guys handle this kind of spam/phishing where no text (or not relevant text at least) but only an image is being sent to your users?

Thanks,
User avatar
DualBoot
Elite member
Elite member
Posts: 1326
Joined: Mon Apr 18, 2016 8:18 pm
Location: France - Earth
ZCS/ZD Version: ZCS FLOSS - 8.8.15 Mutli servers
Contact:

Re: Spam check for images?

Post by DualBoot »

Hello,

ability to analyze text into an image is resource consumption, this is why I think no one has implemented yet into Amavis Spam Detection.
The best protection is to teach users, and put a more restricted score on the basis of image only filter detection.

Regards,
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 896
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: Spam check for images?

Post by JDunphy »

They can be difficult for sure. I wrote an image.pm module but it works by observing tracking and structure and not the image itself. By default, SA has had quite a few attempts at this problem that could be used for some targeted meta rules for your spam mix with better success. There is certainly no shortage of ideas out there for this problem. We have focused on the html structure, tracking and obfuscation more in recent years to target this.

Check out 20_html_tests.cf which has the following tests that match ratio of text to image area.

Code: Select all

# HTML_IMAGE_ONLY - not much raw HTML with images (absolute)
body HTML_IMAGE_ONLY_04         eval:html_image_only('0000','0400')
body HTML_IMAGE_ONLY_08         eval:html_image_only('0400','0800')
body HTML_IMAGE_ONLY_12         eval:html_image_only('0800','1200')
body HTML_IMAGE_ONLY_16         eval:html_image_only('1200','1600')
body HTML_IMAGE_ONLY_20         eval:html_image_only('1600','2000')
body HTML_IMAGE_ONLY_24         eval:html_image_only('2000','2400')
body HTML_IMAGE_ONLY_28         eval:html_image_only('2400','2800')
body HTML_IMAGE_ONLY_32         eval:html_image_only('2800','3200')
describe HTML_IMAGE_ONLY_04     HTML: images with 0-400 bytes of words
describe HTML_IMAGE_ONLY_08     HTML: images with 400-800 bytes of words
describe HTML_IMAGE_ONLY_12     HTML: images with 800-1200 bytes of words
describe HTML_IMAGE_ONLY_16     HTML: images with 1200-1600 bytes of words
describe HTML_IMAGE_ONLY_20     HTML: images with 1600-2000 bytes of words
describe HTML_IMAGE_ONLY_24     HTML: images with 2000-2400 bytes of words
describe HTML_IMAGE_ONLY_28     HTML: images with 2400-2800 bytes of words
describe HTML_IMAGE_ONLY_32     HTML: images with 2800-3200 bytes of words

# HTML_IMAGE_RATIO - more image area than text (ratio)
body HTML_IMAGE_RATIO_02        eval:html_image_ratio('0.000','0.002')
body HTML_IMAGE_RATIO_04        eval:html_image_ratio('0.002','0.004')
body HTML_IMAGE_RATIO_06        eval:html_image_ratio('0.004','0.006')
body HTML_IMAGE_RATIO_08        eval:html_image_ratio('0.006','0.008')
describe HTML_IMAGE_RATIO_02    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_04    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_06    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_08    HTML has a low ratio of text to image area

# HTML_IMAGE_RATIO - more image area than text (ratio)
body HTML_IMAGE_RATIO_02        eval:html_image_ratio('0.000','0.002')
body HTML_IMAGE_RATIO_04        eval:html_image_ratio('0.002','0.004')
body HTML_IMAGE_RATIO_06        eval:html_image_ratio('0.004','0.006')
body HTML_IMAGE_RATIO_08        eval:html_image_ratio('0.006','0.008')
describe HTML_IMAGE_RATIO_02    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_04    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_06    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_08    HTML has a low ratio of text to image area

...
...

Some other rules inside HTMLEval.pm that you might use...

Code: Select all

  # the important bit!
  $self->register_eval_rule("html_tag_balance");
  $self->register_eval_rule("html_image_only");
  $self->register_eval_rule("html_image_ratio");
  $self->register_eval_rule("html_charset_faraway");
  $self->register_eval_rule("html_tag_exists");
  $self->register_eval_rule("html_test");
  $self->register_eval_rule("html_eval");
  $self->register_eval_rule("html_text_match");
  $self->register_eval_rule("html_text_match_count");
  $self->register_eval_rule("html_body_text_match_count");
  $self->register_eval_rule("html_title_subject_ratio");
  $self->register_eval_rule("html_text_not_match");
  $self->register_eval_rule("html_range");
  $self->register_eval_rule("check_iframe_src");
I see that we added a meta rule in our salocal.cf ... the arguments to html_image_only are min, max number of words... so below that would be 600 bytes of words.

Code: Select all

# serious phishing attempts
body __HTML_IMAGE_ONLY_LOW         eval:html_image_only('0000','0600')
meta    J_IMAGE_PHISH   (J_DANGEROUS_ATTACH && __HTML_IMAGE_ONLY_LOW)
score   J_IMAGE_PHISH   2.5
describe J_IMAGE_PHISH  using an image (and not more) in HTML to disguise a phishing attack. Has a dangerous attachment
You might be able to create some custom meta rules via ImageInfo. Check /opt/zimbra/common/lib/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm ... A lot of this would yield false positives by themselves which is probably why they are not mainline rules anymore... but with the correct meta statements they might be useful. Probably a few other rules in that directory you might try and use.

Code: Select all

# Usage:
#  image_count()
#
#     body RULENAME  eval:image_count(<type>,<min>,[max])
#        type: 'all','gif','png', or 'jpeg'
#        min: required, message contains at least this
#             many images
#        max: optional, if specified, message must not
#             contain more than this number of images
#
#  image_count() examples
#
#     body ONE_IMAGE  eval:image_count('all',1,1)
#     body ONE_OR_MORE_IMAGES  eval:image_count('all',1)
#     body ONE_PNG eval:image_count('png',1,1)
#     body TWO_GIFS eval:image_count('gif',2,2)
#     body MANY_JPEGS eval:image_count('gif',5)
#
#  pixel_coverage()
#
#     body RULENAME  eval:pixel_coverage(<type>,<min>,[max])
#        type: 'all','gif','png', or 'jpeg'
#        min: required, message contains at least this
#             much pixel area
#        max: optional, if specified, message must not
#             contain more than this much pixel area
#
#   pixel_coverage() examples
#
#     body LARGE_IMAGE_AREA  eval:pixel_coverage('all',150000)  # catches any images that are 150k pixel/sq or higher
#     body SMALL_GIF_AREA  eval:pixel_coverage('gif',1,40000)   # catches only gifs that 1 to 40k pixel/sql
#
#  image_name_regex()
#
#     body RULENAME  eval:image_name_regex(<regex>)
#        regex: full quoted regexp, see examples below
#
#  image_name_regex() examples
#
#     body CG_DOUBLEDOT_GIF  eval:image_name_regex('/^\w{2,9}\.\.gif$/i') # catches double dot gifs  abcd..gif
#
User avatar
DavidMerrill
Advanced member
Advanced member
Posts: 126
Joined: Thu Jul 30, 2015 2:44 pm
Location: Portland, ME
ZCS/ZD Version: 8.8.15 P19
Contact:

Re: Spam check for images?

Post by DavidMerrill »

Imagining there's some 3rd-party who's developed (developing?) a tool that leverages something like Amazon machine learning?

- https://aws.amazon.com/blogs/machine-le ... sagemaker/

The use case must be classsic, hand-off the image to the image-classifier, if it's tainted, do something with the senders email.

The trick would be to tie it into Zimbra/SA.
___________________________________
David Merrill - Zimbra Practice Lead
OTELCO Zimbra Hosting, Licensing and Professional Services
Zeta Alliance
User avatar
DavidMerrill
Advanced member
Advanced member
Posts: 126
Joined: Thu Jul 30, 2015 2:44 pm
Location: Portland, ME
ZCS/ZD Version: 8.8.15 P19
Contact:

Re: Spam check for images?

Post by DavidMerrill »

___________________________________
David Merrill - Zimbra Practice Lead
OTELCO Zimbra Hosting, Licensing and Professional Services
Zeta Alliance
User avatar
pup_seba
Outstanding Member
Outstanding Member
Posts: 687
Joined: Sat Sep 13, 2014 2:43 am
Location: Tarragona - Spain
Contact:

Re: Spam check for images?

Post by pup_seba »

Everyday I learn that I know less and less :D My godness guys, I'll check all this info and see if I can apply it to some platforms.

I also liked the ocr plugins for sa...i will try to take a look at that too.

Thank you!!!
Post Reply