Page 1 of 1

Spam check for images?

Posted: Mon Apr 15, 2019 11:45 am
by pup_seba
Hi,

I see that some of the phishing attempts or others (like the ones "we hacked your account"), are being sent with no text in the body, but an image (.png) with the text, the bitcoin wallet info and a QR code (nice addition since there is no hiperlink in the image to clic on).

I've been googling without success for a way to check for images text. I found some Spamassassin plugins but I found this info in forums from 2013 or so. How do you guys handle this kind of spam/phishing where no text (or not relevant text at least) but only an image is being sent to your users?

Thanks,

Re: Spam check for images?

Posted: Mon Apr 15, 2019 2:59 pm
by DualBoot
Hello,

ability to analyze text into an image is resource consumption, this is why I think no one has implemented yet into Amavis Spam Detection.
The best protection is to teach users, and put a more restricted score on the basis of image only filter detection.

Regards,

Re: Spam check for images?

Posted: Mon Apr 15, 2019 3:50 pm
by JDunphy
They can be difficult for sure. I wrote an image.pm module but it works by observing tracking and structure and not the image itself. By default, SA has had quite a few attempts at this problem that could be used for some targeted meta rules for your spam mix with better success. There is certainly no shortage of ideas out there for this problem. We have focused on the html structure, tracking and obfuscation more in recent years to target this.

Check out 20_html_tests.cf which has the following tests that match ratio of text to image area.

Code: Select all

# HTML_IMAGE_ONLY - not much raw HTML with images (absolute)
body HTML_IMAGE_ONLY_04         eval:html_image_only('0000','0400')
body HTML_IMAGE_ONLY_08         eval:html_image_only('0400','0800')
body HTML_IMAGE_ONLY_12         eval:html_image_only('0800','1200')
body HTML_IMAGE_ONLY_16         eval:html_image_only('1200','1600')
body HTML_IMAGE_ONLY_20         eval:html_image_only('1600','2000')
body HTML_IMAGE_ONLY_24         eval:html_image_only('2000','2400')
body HTML_IMAGE_ONLY_28         eval:html_image_only('2400','2800')
body HTML_IMAGE_ONLY_32         eval:html_image_only('2800','3200')
describe HTML_IMAGE_ONLY_04     HTML: images with 0-400 bytes of words
describe HTML_IMAGE_ONLY_08     HTML: images with 400-800 bytes of words
describe HTML_IMAGE_ONLY_12     HTML: images with 800-1200 bytes of words
describe HTML_IMAGE_ONLY_16     HTML: images with 1200-1600 bytes of words
describe HTML_IMAGE_ONLY_20     HTML: images with 1600-2000 bytes of words
describe HTML_IMAGE_ONLY_24     HTML: images with 2000-2400 bytes of words
describe HTML_IMAGE_ONLY_28     HTML: images with 2400-2800 bytes of words
describe HTML_IMAGE_ONLY_32     HTML: images with 2800-3200 bytes of words

# HTML_IMAGE_RATIO - more image area than text (ratio)
body HTML_IMAGE_RATIO_02        eval:html_image_ratio('0.000','0.002')
body HTML_IMAGE_RATIO_04        eval:html_image_ratio('0.002','0.004')
body HTML_IMAGE_RATIO_06        eval:html_image_ratio('0.004','0.006')
body HTML_IMAGE_RATIO_08        eval:html_image_ratio('0.006','0.008')
describe HTML_IMAGE_RATIO_02    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_04    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_06    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_08    HTML has a low ratio of text to image area

# HTML_IMAGE_RATIO - more image area than text (ratio)
body HTML_IMAGE_RATIO_02        eval:html_image_ratio('0.000','0.002')
body HTML_IMAGE_RATIO_04        eval:html_image_ratio('0.002','0.004')
body HTML_IMAGE_RATIO_06        eval:html_image_ratio('0.004','0.006')
body HTML_IMAGE_RATIO_08        eval:html_image_ratio('0.006','0.008')
describe HTML_IMAGE_RATIO_02    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_04    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_06    HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_08    HTML has a low ratio of text to image area

...
...

Some other rules inside HTMLEval.pm that you might use...

Code: Select all

  # the important bit!
  $self->register_eval_rule("html_tag_balance");
  $self->register_eval_rule("html_image_only");
  $self->register_eval_rule("html_image_ratio");
  $self->register_eval_rule("html_charset_faraway");
  $self->register_eval_rule("html_tag_exists");
  $self->register_eval_rule("html_test");
  $self->register_eval_rule("html_eval");
  $self->register_eval_rule("html_text_match");
  $self->register_eval_rule("html_text_match_count");
  $self->register_eval_rule("html_body_text_match_count");
  $self->register_eval_rule("html_title_subject_ratio");
  $self->register_eval_rule("html_text_not_match");
  $self->register_eval_rule("html_range");
  $self->register_eval_rule("check_iframe_src");
I see that we added a meta rule in our salocal.cf ... the arguments to html_image_only are min, max number of words... so below that would be 600 bytes of words.

Code: Select all

# serious phishing attempts
body __HTML_IMAGE_ONLY_LOW         eval:html_image_only('0000','0600')
meta    J_IMAGE_PHISH   (J_DANGEROUS_ATTACH && __HTML_IMAGE_ONLY_LOW)
score   J_IMAGE_PHISH   2.5
describe J_IMAGE_PHISH  using an image (and not more) in HTML to disguise a phishing attack. Has a dangerous attachment
You might be able to create some custom meta rules via ImageInfo. Check /opt/zimbra/common/lib/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm ... A lot of this would yield false positives by themselves which is probably why they are not mainline rules anymore... but with the correct meta statements they might be useful. Probably a few other rules in that directory you might try and use.

Code: Select all

# Usage:
#  image_count()
#
#     body RULENAME  eval:image_count(<type>,<min>,[max])
#        type: 'all','gif','png', or 'jpeg'
#        min: required, message contains at least this
#             many images
#        max: optional, if specified, message must not
#             contain more than this number of images
#
#  image_count() examples
#
#     body ONE_IMAGE  eval:image_count('all',1,1)
#     body ONE_OR_MORE_IMAGES  eval:image_count('all',1)
#     body ONE_PNG eval:image_count('png',1,1)
#     body TWO_GIFS eval:image_count('gif',2,2)
#     body MANY_JPEGS eval:image_count('gif',5)
#
#  pixel_coverage()
#
#     body RULENAME  eval:pixel_coverage(<type>,<min>,[max])
#        type: 'all','gif','png', or 'jpeg'
#        min: required, message contains at least this
#             much pixel area
#        max: optional, if specified, message must not
#             contain more than this much pixel area
#
#   pixel_coverage() examples
#
#     body LARGE_IMAGE_AREA  eval:pixel_coverage('all',150000)  # catches any images that are 150k pixel/sq or higher
#     body SMALL_GIF_AREA  eval:pixel_coverage('gif',1,40000)   # catches only gifs that 1 to 40k pixel/sql
#
#  image_name_regex()
#
#     body RULENAME  eval:image_name_regex(<regex>)
#        regex: full quoted regexp, see examples below
#
#  image_name_regex() examples
#
#     body CG_DOUBLEDOT_GIF  eval:image_name_regex('/^\w{2,9}\.\.gif$/i') # catches double dot gifs  abcd..gif
#

Re: Spam check for images?

Posted: Mon Apr 15, 2019 6:05 pm
by DavidMerrill
Imagining there's some 3rd-party who's developed (developing?) a tool that leverages something like Amazon machine learning?

- https://aws.amazon.com/blogs/machine-le ... sagemaker/

The use case must be classsic, hand-off the image to the image-classifier, if it's tainted, do something with the senders email.

The trick would be to tie it into Zimbra/SA.

Re: Spam check for images?

Posted: Tue Apr 16, 2019 12:08 am
by DavidMerrill

Re: Spam check for images?

Posted: Wed Apr 17, 2019 7:19 am
by pup_seba
Everyday I learn that I know less and less :D My godness guys, I'll check all this info and see if I can apply it to some platforms.

I also liked the ocr plugins for sa...i will try to take a look at that too.

Thank you!!!