This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Filtering non English character emails

Hi All,

Recently we are getting inundated with emails containing Russian and Chinese characters. Is there anyway to filter these emails to quarantine via UTM 9 anti spam? 

Thanks,



This thread was automatically locked due to age.
  • Interesting, Andrew.  Hmmm, singled out for harassment by the Chinese and Russian militaries?  You must be working for the good guys!

    All I can think of is to open some of those Chinese emails and add Chinese characters a-line-at-a-time to the 'Expression Filter'.  Same with the few upper- and lower-case Cyrillic characters that aren't identical to any in the Latin or Greek alphabet.

    Please let us know if that worked for you.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • haha yes the good guys. Great suggestion, what syntax would I use i use though to quarantine any email that contains the following characters ж л и 日 的

     

    Thanks

  • You should be able to do this with a regex.

    I don't normally post non-Sophos links, but I found this resource which explains Unicode and Regex and the syntax differences between different Regex libraries.

    https://www.regular-expressions.info/refflavors.html

    I do not know what RegEx library is used on UTM.   Mr Alfson or Mr. Jaydeep, can you answer that?

    Recently I tried this Regex in one of my three spam filters.  I think this particular syntax works all regex libraries..

    [\p{S}p{C}P{Latin}]

    That is supposed to say:

    • Any symbol (heart, smiley, etc.)
    • Any control character.
    • Any non-Latin script

    We actually disabled the rule because it was matching too much.   In retrospect, I think the issue was that I was checking both Body and Subject, and the body has line feeds, so the control character match is not appropriate for the Body check.

    For your purposes, [P{Latin}] should catch Russian and Chinese text (and Korean and Katakana, etc.)