Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

RegEx URL: Exactly what implementation of regular expressions is used on Sophos XG (SFOS15)?

Is it Posix, Extended RegExp, Perl, ECMAscript or other?
I have had a hard time finding the correct syntax for HTTP bypass rules. It does not appear clear from documentation...

It would also be very nice, to have a RegEx tester built in, to check if you syntax actually matches what you want - and not by mistake maches every URL!
(Is there somewhere in the logs to check this?)

- Martin

EDIT:
And what is the sane explanation, that it is not possible to use RegEx bypass rules for HTTPS scanning?!?
This does not make any sense... 



This thread was automatically locked due to age.
  • Hi,

    You can use Quick_Regex tool to create a regular expression. Meanwhile, I have forwarded this query to the Dev.Team.  

    We shall update you once we receive any information.

    Thanks 

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Hi Sachin

    Thank you, i allready know about Quick Regex tool.
    It is nice, if you only want to block/allow certain domain names, and don't know anything about RegExp.

    My problem is, that i DO know regexp extensibly, but it is a bit of hit and miss, when trying to do something.
    For example, it took me a while to realize that http:// was not part of the URL being evaluated, so ^http.*\.com fails to match anything, when you would expect it to match all non https sites with .com TLD.

    Another example (to bypass scanning of all MP3 / MP4 podcasts and files).
    ^[[:graph:]]+\.mp[34](\?.*)?$   <-- this one works
    ^https?\:\/\/[[:graph:]]+\.mp[34](\?.*)?$  <-- this does not work (even though it should)

    So, even the Quick Regex tool gives wrong URL regex to be used with Sophos XG!

    Please provide de correct documentation ASAP!

  • Hi,

    We are supporting the standard RegEx format so there’s no specific format constraint here. Can you test the same thing in v16 and update whether problem persist or not.

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Dear Sachin

    I dont think you fully understand, what you are answering...

    There are MANY different regular expression standards, where these three are probably the closest to a defacto standard:
    - PCRE (Perl Compatible Regular Expressions)
    - IEEE Posix BRE (Standard Regular Expression)
    - IEEE Posix ERE (Extended Regular Expression) 

    I looked a bit in the UTM manual, but i was unclear if this was correct, as there was the same wrong statement about starting RegEx with ^http:.
    So, therefore i ask you again - can you please help to get the correct information / documentation / syntax about the Regular Expression implementation used in Sophos XG SFOS?

    Unfortunately i do not have SFOS16, as i am running this on our production HA XG310's...
    But if there has been any change in the implementation from v15 to v16 i hope, for gods sake, you or someone else have this documented in the changelogs and documentation!!!

    - Martin

  • Hi,

    The information is provided by the Customer Engineering team. I will try to engage with the documentation team to get the required information published but that will be for v16. 

    Thanks 

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • There are two places where regex is used for web

    - In input validation in the UI
    - In the proxy itself

    The latter is definitely PCRE.  I'm not sure about the input validation, since it doesn't evaluate the RE but I think it should accept all PCRE.

    Whether http is matched or not is not a RegEx implementation/standards issue.

    There have been several changes in this area in v16.  I know that in v16 you can include http in the the match string.  I don't know about v15 but I didn't think we changed that behavior.

  • Hi Martin,

    Here's an update-

    We have different check for RegEx at multiple location. The RegEx should be Perl and Java compatible and Max no of URL in Exception list should be < 128 and length of URL is < 100.

    HTTP Proxy:

    The proxy compiles the RegExes in the UI using pcre_compile which is “Perl-compatible regular expressions”

    API:

    URL RegExs can’t start with ^https:// or ^http://


    RegExes are not automatically anchored and must be if desired (example: ^microsoft\.com/ will matchhttp://microsoft.com/ but not http://www.microsoft.com/. If anchor is missing like: microsoft\.com/ then bothhttp://microsoft.com and http://www.microsoft.com will match)

    The max length of URLRegEx is 100, this is restricted by DB schema

    URL RegEx is validated by Perl compiler

    UI:

    URL RegExs can’t start with ^https:// or ^http:// (there's a bug though, see NC-11547)

    The max length of urlregex is 100

    We use the Java Script library RegExp to validate the syntax of the Regexes
    1. Check # of groups (e.g. if \2 is used, there must be at least 2 groups)
    2. Check [] content (e.g. [] should not be allowed because it's empty)

    total # of URL RegExes in an exception < 128

    Hope that helps :)

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • "RegExes are not automatically anchored and must be if desired (example: ^microsoft\.com/ "

     

    We just implemented the XG coming off the UTM and not anchoring the regex made the appliance unusable, the processor would maxed out under any kind of load.  

    With the current design, it should probably be required.

     

    Our support person is suppose to be writing a KBA about it shortly.

  • Please be aware of this KB

    https://community.sophos.com/kb/en-us/127270

    Summary:  In URL Groups and in Categories there is no RegEx, the KB describes what substring matching is done.

     

    In XG Web \ Exceptions we do not automatically anchor on left side.  This gives more flexibility to admins.  Yes that includes flexibility to be inefficient.  We will not be change this because it would affect existing customers.

     

    In both XG and in the UTM (and I would argue any computer system anywhere) - when you create a new object you should copy the existing out of box objects as much as possible.

    In the XG one of the OOB exception is:

    ^([A-Za-z0-9.-]*\.)?apple\.com\.?/

    So if you want to create a new exception you match that style.

  • The KB doesnt mention the performance hit of not using the ^.

    Coming from the UTM, we copied our existing regex over, but removed the ^https:// since it wasnt allowed.  Not realizing the ^ had a huge performance hit.  I started reviewing everything trying to find what was maxing the processor and this thread in particular talked about not using the ^ would allow subdomains, which is what we were trying to do anyway. 

    I see your point about not requiring it, but I would caution people to use it very sparingly.