Feature request - DNS Web Filtering

Given how prevalent HTTPS is these days , DNS Filtering on the Guest , IOT  , Reporting and Mobile WLANS is an absolute requirement.

To be able for the Sophos XG's to function like a DietPI without the need for running a seperate HW/VM would be a godsend for us.We are increasingly finding that the only thing onsite piece of equipment customers tend to have is a router + fiber/wireless and 4G Failover.It gives customers piece of mind that contractors and users can perform their duties on mobiles whilst not doing unproductive/illegal work as well as provide some web filtering for black box IOT devices.

Whilst these things aren't perfect ie we sometimes have to allow web connectivity checkers from android and apple to go through otherwise it disconnects from the wifi.

Use Cases

1)Messaging - Common aggregation point (Canteen/SOC/Emergency Assembly Point )  or site wide WIFi where users should only be able to access MS Teams , SFB or Slack to report issues , hazards,productivity issues etc . Many devices here are BYOD and in a lot of my customer use cases they have a contractor on site for 2 months . It's not worth their time and money to send me out installing and removing HTTPS certs on devices . In many cases we've found App Control is not granular enough or sometimes doesn't function cross platform.

2) IOT LockDown - IOT Devices that should only be able to go to $manufacturer.com and check for updates and update telemetry data . Cannot deploy HTTPS Scanning Certificates on a web connected sprinkler or Connected Cow system . FQDN Hostgroups and wildcard support have made this much less of a painpoint than before , but will  further lock down what an IOT device can and cannot do on the network.

3)Realtime Alerting on Day 0/Suspicious/Prohibited DNS Activity - ATP somewhat does this , but being able to get realtime log email alerts when a bad actor queries to go to a categorized dns domain ie c&c/hacking/fraud/proxy/pron  site to allow immediate discovery , investigation and removal of bad actors on the network.

 

Features it will need to reach feature parity with some competitor solutions:

1)Live query logging via F/W console to debug issues . We've had issues where a provider will ask us only to whitelist *incompetentprovider.com only to find out they also use cdn.provider.com

2)Ability via Daily/Weekly Reporting in XG and IView to report on Top 20 domains accessed , Top 20 Domains Denied , Risky DNS domains accessed etc

3)Full Blacklist ability ie only allow domain1.com and deny all other dns queries.

4)Regex support for BOTH whitelisting and blacklisting client ie client[1-99]\.google\.com

  • Wow, nice wish list. I always run Pihole at home because of some of the IoT issues that you mentioned, thanks for expanding it and making a business case. The sophos filtering database is actually way smarter than simple allow/block pihole, the implementation hasn't evolved with times though. I don't know why we can't do regex or even simple wildcards easily where needed.

    New features are probably closed for this release but your post may bring this up for future releases. 

  • Current implementation: HTTPS with no scanning does categorization using the FQDN from the SNI. It can block via webpage (and certificate) or by dropping the connection. We do not do categorization/blocking on DNS requests. The practical difference between the current solution and DNS is that we currently can log/block all FTP/HTTP/HTTPS connections to sites in any given category, however a DNS solution would apply to other connections/ports as well. The problem with DNS categorization and filtering is that it is easy to bypass. A device with malware can use a local hosts file. A device with someone bypassing filtering can use google's DNS servers.
    ATP does work on DNS (in addition to the web proxy), however it is designed to only catch callhome domains (infected devices connecting to known command and control sites).

    If your concern is HTTPS, I fail to understand what DNS filtering gets your that the current HTTPS proxy (with scanning off) does not get you. Can you explain?

    1) What about the current solution does not allow you to do this? Set up a URL Group or Custom Category for your whitelisted messaging FQDNs, create a web policy that allows the whitelist and blocks everything else. Create a WiFi SSID for "Guest WiFi (Locked Down)". Create a firewall rule for that network that goes to that web policy, no HTTPS decryption. Anyone who connects can access your allowed things no problem. If they try and access anything not allowed they'll get a block page (HTTP) or a certificate warning (HTTPS)/dropped connection (as per config).

    2) Agreed. Again, I'm not sure what about the current solutions does not allow you to do this, though it may be painful to do. AFAIK IoT is on our radar.

    3) I cannot speak on the reporting/alerting side. The data is there in the web proxy reporting.

    Wishlist:

    1) I know I can, but not via command line. This is more about preferred way of working.

    2) I cannot speak on the reporting/alerting side.

    3) You can do via web proxy, not DNS. Again, if the concern is HTTP/HTTPS why do you need DNS? And how would this deal with anyone who chooses to use any other DNS provider?

    4) Agreed, somewhat. You can whitelist via exceptions using RegEx, though this applies to all policies. Whitelist/Blacklist via URL groups or custom categories has some wildcarding but is not RegEx. There is a problem of this legacy of several ways of doing things and we'd rather not introduce a new, additional ways. Coming up with a solution that works for what we want and also maintains that upgrades with pre-existing stuff still behaves the same is difficult.

    Whitelist/Blacklist KB community.sophos.com/.../127270

  • Thanks Michael , you've made some good points .

     

     

    Just a question now that TLS 1.3 is effectively upon us.

    Can you still filter BYOD devices via SNI  without decryption ? 

  • Michael as always a well thought out response to someone's problem instead of the usual answers. I can comment on a couple of things as I use pihole in addition to regular web filtering in my setup. I previously used external DNS providers that provide similar services. DNS blocking is augmenting the current setup, it is not replacing the functionality. Nobody said take away my web filtering and give me DNS filtering.

    1. You can easily block clients from accessing outside DNS servers by denying external DNS on the firewall. Actually I always redirect all my clients to my firewall for DNS resolution no matter what they specify so DNS bypass by bots and clients doesn't work. Hard coded IPs will pass using this approach but my firewall is catching that hopefully using ATP or even regular categorization.

    2. No certs are needed, you simply point your XG to the DNS server that is doing DNS filtering. Very minimal impact on RAM/processor usage as the traffic is not decrypted and actually denied on the DNS request.  https://community.sophos.com/products/xg-firewall/sfos-eap/sfos-v18-early-access-program/f/feedback-and-issues/115989/unable-to-use-dpi-with-a-system-with-4-gb-of-ram---bug 

    3. I don't understand the hesitation in adding expanded functionality to use regex or wildcards. Don't add the additional functionality in the upgrades. A user that wants additional flexibility is being held back since the legacy users won't be able to upgrade is not really a smart choice.

    Thanks again

    Regards

    Bill

  • 2) For allowed requests you are adding delay because you now have DNS doing duplicate work.  Both DNS and Web will be doing categorization lookups (caching will means it is not too bad).  But the configuration in XG is going to be interesting.  You either need to separately configure what DNS categories to allow/block or you have DNS read and use the web policy.  That means you are putting how you respond to DNS requests in the firewall rule. 

    3) Hesitating when there are multiple possible solutions or when it can cause confusion to the user makes sense.  We could extend the current URL groups.  We could extend the current custom categories.  We could implement a 'websites' tab similar UTM.  We could implement a blacklist/whitelist widget similar to the UTM.  We could do something new and different and tied exactly to specific use cases.  Then afterwards when we instruct admins on how to do whitelist do we explain two methods, three, or four?  My point is that it is not as simple as "just do it".  Not a major factor, but applying regex to every request is more CPU intensive than existing plaintext matches.

  • I just had a quick lesson in TLS 1.2 and 1.3 Client hellos.  :)

    In short - SNI is still sent plaintext in TLS 1.3.  There is an extension that allows for encrypted SNI but no one is using it (yet).  The handshake for TLS 1.3 looks like the handshake for TLS 1.2, which makes it backwards compatible.  But a handshake for TLS 1.3 with encrypted SNI, looks to a TLS 1.2 server as a handshake with no SNI at all, something that many TLS 1.2 servers wont like.  Proper encrypted SNI also requires you to use DNS-over-TLS.

    Effectively, if a client uses encrypted SNI, that client is TLS 1.3 only.

    When using the new DPI mode, TLS 1.3 is fully supported.  Even if you are not doing decryption you still have the SNI and can still block on it.  If configured for block pages (default) then it lets the connection through to the server, inserting itself into the encryption (using the XG CA), to redirect the client to a page on XG:8090 containing the real block page. 

    When using the proxy mode, TLS 1.3 is not supported.  Even if you are not doing decryption you still have the SNI and can still block on it.  If configured for block pages (default) it will pretend it is the server and that it only supports TLS 1.2.  It used the XG CA to display the block page.

     

    tl;dr answer:  SNI is unencrypted and both DPI and proxy mode work fine.  It will remain like this until clients and servers decide backwards compatibility no longer matters.  Let me know the lag time between the IPv6 standard and non-test websites that only support IPv6.

     

    https://blog.cloudflare.com/encrypted-sni/

  • Thanks Michael for at least considering this. With the new DPI mode from what I understand it can also intercept non standard ports ( ie non 80/443) I take it ? That's one of the other issues I've faced , in that a lot of these devices are using nonstandard ports like  8443 , 8180 , 8880 to connect to some web api socket for data streaming.

    DNS Filtering isn't an I win button , but if you layer it up with port ranges , APT , address lists/web categories it works pretty well. Effectively the "dumber" , "unmanaged" or "blacker" the box the more we tend to need it.

    Especially now that we can force to do redirects to the XG's DNS server (Page 52 of the EAP1 PDF) it makes a lot of sense to put those hands together being able to filter certain clients DNS views.

    Granted this is all just feels like it's all just temporary wins/control in this arms race this until the browser/app developers decide to head off to fully encrypted DNS and SNI.

  • Michael Dunn said:

    2) For allowed requests you are adding delay because you now have DNS doing duplicate work.  Both DNS and Web will be doing categorization lookups (caching will means it is not too bad).  But the configuration in XG is going to be interesting.  You either need to separately configure what DNS categories to allow/block or you have DNS read and use the web policy.  That means you are putting how you respond to DNS requests in the firewall rule. 

    Come on Michael, delay in duplicate dns queries? DNS is probably one of the most efficient protocols. I was envisioning it like this. 

    Send request to DNS server, record exists in DNS server request denied. OR

    Record doesn't exit, do what we do currently.

    Michael Dunn said:

    3) Hesitating when there are multiple possible solutions or when it can cause confusion to the user makes sense.  We could extend the current URL groups.  We could extend the current custom categories.  We could implement a 'websites' tab similar UTM.  We could implement a blacklist/whitelist widget similar to the UTM.  We could do something new and different and tied exactly to specific use cases.  Then afterwards when we instruct admins on how to do whitelist do we explain two methods, three, or four?  My point is that it is not as simple as "just do it".  Not a major factor, but applying regex to every request is more CPU intensive than existing plaintext matches. 

    The point of having regex/wildcard is that it makes it very easy to quickly block/allow something... a big hammer if you will. I am not arguing the cpu cycles here, it seems so trivial using wildcard blocking when using web proxy yet in XG, we have to run through hoops to simply say block anything *xyz.*

     Again, my original point remains, we are stopping evolution because some old admins will be lost when new functionality is introduced. Old admins are always lost thats why they are still asking for a UTM clone ;-)

    This is interesting discussion and we can go back and forth but seems OP (Tha IT) is satisfied with the new DPI capabilities so I will stop here.

    Regards

  • Billybob said:

    Michael Dunn

    2) For allowed requests you are adding delay because you now have DNS doing duplicate work.  Both DNS and Web will be doing categorization lookups (caching will means it is not too bad).  But the configuration in XG is going to be interesting.  You either need to separately configure what DNS categories to allow/block or you have DNS read and use the web policy.  That means you are putting how you respond to DNS requests in the firewall rule.  

    Come on Michael, delay in duplicate dns queries? DNS is probably one of the most efficient protocols. I was envisioning it like this. 

    Send request to DNS server, record exists in DNS server request denied. OR

    Record doesn't exit, do what we do currently.

      

    Maybe I misunderstood the feature request.  My understanding was that they wanted to block access to websites by DNS.  So for example they want to block access to Adult websites.  They configure that DNS requests to Adult websites results in a NXDOMAIN.

    User does a DNS query for playboy.  XG DNS server asks nsxld for the category of playboy, nsxld does not know (not in cache) so it asks cloud SXL server, which replies.  nsxld tells DNS server that it is Adult, and DNS server replies NXDOMAIN to the client. 

    Then another client connects but they have a different policy that does not block Adult.  Client 2 does DNS request, the DNS server asks nsxld for the category.  This time it is in the cache so nsxld replies quickly saying it is adult.  DNS looks at policy, adult is not blocked, so it gives the real IP.  Then the client makes an HTTP request.  This goes to the web proxy, which needs to know the category so it asks nsxld, which has it in cache and replies quickly.  etc.

    My point is that the DNS server needs to understand policy (whether its own configuration or uses the web configuration), it needs to do categorization, it needs to give different results to different users.  In the first request there is delay added because it needed to ask the cloud servers what the category is.  Overall, this isn't horrible and DNS is used to delays if it is not in local DNS cache (this is why DNS requests can take 200ms or under 1ms).

    Now if the DNS server is implementing a full policy (like web does) that includes things basing policy on time periods, logged in user, etc that is getting bigger.

     

    If by DNS filtering you just wanted to blacklist specific FQDN in a admin supplied list, where the blacklist applies to every DNS lookup, that is simpler.  But if you wanted DNS to implement policy, that is writing a whole new system that mostly duplicates functionality provided by the existing web proxy.

  • Billybob said:

    The point of having regex/wildcard is that it makes it very easy to quickly block/allow something... a big hammer if you will. I am not arguing the cpu cycles here, it seems so trivial using wildcard blocking when using web proxy yet in XG, we have to run through hoops to simply say block anything *xyz.*

    I absolutely agree.  An admin needs a method to quickly block/allow something.  But I don't think the current system is running through hoops.  Once the blacklist is originally set up, I add to blocking any url with "xyz" in it in less than half a minute.  No I cannot block RegEx "abc.*xyz" but how often does an admin need to?  And actually an admin writing RegEx should be writing RegEx like "^([A-Za-z0-9.-]*\.)?abc.*.?/.*xyz" or whatever it is they want.  Writing correct RegEx to allow/block is slower than writing plaintext in a URL Group.

    In the UTM (I think) a long time ago we had a list of out-of-box exceptions written in a fairly simple RegEx.  When we did performance analysis we found a there was definite time spent in just running those default RegEx.  We rewrote them (to look like above) and saw a measurable throughput gain.  I've seen a customer box where they have complained about speed problems and found that it was because they had horribly written RegEx.  Worse is that we've had lots of customers who complain that "you guys didn't catch such-and-such" and when we look into it, its because an badly written exception/whitelist applied when it should not have. 

    RegEx is a double edged sword.  An admin might want to whitelist google.com so they write the RegEx "google\.com".  And now a user goes to evilsite.com\?secret_bypass=google.com and is allowed through.  I know you want a powerful sword called RegEx, and as an expert admin you might be able to use it for great deeds.  But as a battlefield surgeon, I hesitate to hand everyone Joe Shmoe soldier a sword, especially when they have existing weapons they they already know, that work almost as well and are safer.