This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

reg ex question

This is an interesting one but no doubt someone will explain that I'm just not understanding reg exs  :)

We are using a default reg ex format in the Block these websites section of filter actions e.g. ^https?://([A-Za-z0-9.-]*\.)?youtube\.com/

However we've noticed that if clients browse to a URL youtube.com. they can get past this unless we change the reg ex to ^https?://([A-Za-z0-9.-]*\.)?youtube\.com (remove the / at the end)

Can anyone explain what we did wrong?

And yeah I now have several dozen filter actions to work through  :(



This thread was automatically locked due to age.
  • I'm by no means an expert, but here are my thoughts.

    From what I've read, UTM is using Perl regex:

    community.sophos.com/.../117316

    There is a free online tester here I found:

    http://retester.herokuapp.com/

     

    If I try your regex of: ^https?://([A-Za-z0-9.-]*\.)?youtube\.com/ on the online tester, it looks like its wanting that trailing "/" in the match (eg: https://www.youtube.com/")

    If I visit the youtube homepage by entering www.youtube.com there is no trailing "/" on the url at that point, hence why it is not matching on the homepage of youtube and users can get to that point.

    I note by removing the "/" to make your regex "^https?://([A-Za-z0-9.-]*\.)?youtube\.com" this allows it to match http or https://www.youtube.com but not http or https://www.youtube.com/

    So the "?" in regex means that there needs to be 1 or 0 of the previous character to the "?" to perform the match.

    If i did something like: ^https?://([A-Za-z0-9.-]*\.)?youtube\.com/? on the tester, that allows it to match both http or https://www.youtube.com/ and http or https://www.youtube.com

    You of course also have "*" in regex which is zero or more of the previous character, so ^https?://([A-Za-z0-9.-]*\.)?youtube\.com/* also appears to match up wether the url has the "/" or not since it is 0 or more of the previous character (since this wouldnt ever be the case, I'd use "?")

    Your regex "should" match on any youtube video URL that has something after the trailing / at the end of .com, but for the homepage itself, it looks like that regex is expecting the "/" as part of the pattern it is looking for, and since the homepage of youtube ends at .com, its not matching.

     

    I may get corrected, but I believe that's what is happening here, it's been a few years since I've really used regex extensivly so I may be a little rusty on my assumptions here :)

     

    Thanks

     

    Sheldon