Sophos XG Firewall v18: Troubleshooting problems with the DPI engine

Note: Information is posted as-is and the content should be referenced at your own risk.


Here are some guidelines for getting things working with DPI Mode. This is written just after v18.0 GA and may not be as applicable in the future. In v18 EAP there were a number of reported IoT devices that were not working. There were several fixes put in for GA and so far the people who were having trouble in EAP are reporting success in GA. Due to the nature of upgrades there may be some things that stop working in v18 because they are not previously scanned and now are. This post focuses only on errors when using the DPI Engine and may not apply to general failures or misconfiguration.

You may also want to read these:
https://community.sophos.com/products/xg-firewall/f/recommended-reads/115976/sophos-xg-firewall-v18-xstream---the-new-dpi-engine-for-web-proxy-explained
https://community.sophos.com/kb/en-us/132997
 

General settings:

IPS - Everything should work with IPS (Intrusion Prevention System) on or off, however IPS also uses snort to do detections. If something is not working you may want to try turning IPS off to see if it makes a difference. If something is only working when IPS is off, please let us know.

ATP - Everything should work with ATP (Advanced Threat Protection) globally on or off. If something is not working you may want to try turning ATP off to see if it makes a difference. ATP is used to detect active malware in your network making connections to external systems. Because the new DPI mode does port-agnostic HTTP detection that means we now enforce ATP in port-agnostic way. But that is in turn causing us to enforce a strict HTTP specification compliance on non-standard ports, which in turn causes problems when apps are using HTTP-like connections but do not conform to the spec. When a company writes both the client and the server, and do things on non-standard ports (80/443) they sometimes do things against the spec. If something is only working when ATP is off, please let us know.

Web > General settings > Block unrecognized SSL protocols. This should be at its default value of unchecked.

console command line (not ssh shell). 'show http_proxy' should have relay_invalid_http_traffic at its default value of off.


IoT devices:

One of the v18 improvements is our ability to scan traffic from IoT (Internet of Things) devices, especially those that may be using non-standard ports. However IoT devices may also be more fragile when being scanned.  IoT devices often connect to specific servers and do not always conform to the specifications or work like normal browsers going to websites.

If you have an IoT device that does not work, my recommendation is to first have it working with no filtering/scanning/decryption. Once this is working, administrators can then make changes that improve security around these devices.

Firewall Rule - The IoT device should hit a rule that has no web policy and no malware scanning. When you open the rule the Web Filtering section should be unexpanded. The Other security features should be be set to None.

SSL/TLS inspection rules - The IoT device traffic should hit a rule that is Don't Decrypt with a profile Maximum Compatibility, or it should have no matching rule. If you have some TLS decryption rules for some things, you can create a higher level rule with don't decrypt that uses source of your device, similar to your firewall rule.

If you have an IoT device that is not working with this configuration please raise a support ticket or forum post. Include the device name, whether it was working in 17.5, your configuration, and error messages on the device and in the Log Viewer, Web filter and SSL/TLS Inspection.


Phone Apps not working:

If an application is not working on a phone, you likely do not want to turn off all scanning for all of the phone.
HTTPS decryption can cause problems on phones. You can install the CA (Certificate Authority) but depending on the phone/OS/application it may or may not respect the CA.
As Sophos hears from customers about applications that do not work, we will be making improvements with every release.

If application uses port 80/443 the "Use proxy instead of DPI engine" setting will control whether the traditional web proxy from 17.5 is used or the DPI engine. Problems that occur in both modes should not reported as a DPI problem.

SSL connections that are on other ports always handled by DPI engine, even if they are not HTTPS (we don't know if they are HTTPS unless we decrypt). Applications that make SSL connections on other ports to specific servers may not adhere to the specification that the XG enforces. Even when the traffic is not being decrypted the DPI engine still needs to look at it and that can cause issues.

Make sure that the traffic is not being decrypted with TLS Inspection rules or a Web Exception.

If you have an Application that is working in Proxy mode and not working in DPI mode only, and you are sure it is not being decrypted please raise a support ticket or forum post. Include the phone OS, Application name, whether it was working in 17.5, your configuration, and error messages in the app and in the Log Viewer, Web filter and SSL/TLS Inspection.


Websites not working:

If a website is not working and you are doing HTTPS decryption, the first thing to try is turning off decryption for that site.

Websites typically use port 80/443 and the "Use proxy instead of DPI engine" setting will control whether the traditional web proxy from 17.5 is used or the DPI engine. Problems that occur in both modes should not reported as a DPI problem.

Make sure that the traffic is not being decrypted with TLS Inspection rules or a Web Exception.

If you have a website that is working in Proxy mode and not working in DPI mode only, and you are sure it is not being decrypted please raise a support ticket or forum post. Include the OS, Browser, whether it was working in 17.5, your configuration, and error messages in the browser and in the Log Viewer, Web filter and SSL/TLS Inspection.


Decryption - Web Exceptions and TLS exclusions:

WebAdmin > Log Viewer. Look at the Web filter and SSL/TLS Inspection modules. Note any errors that appear. Some entries have an Exclude link that will allow you to add domains to the Local TLS exclusion list.

WebAdmin > Control Center. Click on the SSL/TLS connections widget. You should see a bunch of statistics including a list of errors in the last 7 days. Click on Fix Errors.
In the pop-up you will see a list of websites, users, and errors. You can select Exclude from decryption to add domains to the Local TLS exclusion list.

WebAdmin > Policy Tester. Put in the details and run the test.  From the results, which include whether the traffic should be decrypted, you can link to the rules that caused that decision and modify them.

You can also exclude by manually editing the Local TLS exclusion list (supports plaintext domain names) or Web Exceptions (supports RegEx and applies to proxy mode as well).


If the traffic is port 80/443 and matches a firewall rule that has "Use proxy instead of DPI engine" then the traditional web proxy is used and the SSL/TLS Inspection rules are not applied. If "Use proxy instead of DPI engine" is not checked, or if the traffic is on any other port, the DPI engine and the SSL/TLS Inspection rules are used.

Consider a scenario where DPI sees a new TCP connection on a port with a TLS ClientHello. At this point we have no idea if it is HTTPS, we only know it is TLS.

The ClientHello in the TLS connection has an optional plaintext field called SNI (Server Name Identification) which should be filled in with the FQDN that it is connecting to. I am guessing (link), but lets say SNI is present in 99.9% of HTTPS traffic and >90% of non-HTTPS TLS traffic. That FQDN is sent to the categorizer. Now DPI knows the source IP, destination IP, FQDN, user, and category. It uses those things to determine which TLS Inspection Rules is matched, including the Local TLS exclusion list. Then it uses those things to determine which exceptions apply, including any HTTPS decryption exceptions. So when using the DPI engine, both TLS exclusion rules and web exceptions apply, and they apply to all TLS connections (not just HTTPS).

  • Thank you for posting this nice summary.  I noticed that my 'show http_proxy' had the 'relay_invalid_http_traffic' set to 'on'.  I've never heard of this setting and I certainly never set it to 'on' if the default setting is 'off'.  I set it to 'off' to see if it makes any noticeable differences.  Is there a list somewhere of what the default settings should be?

  • In reply to Casual_User:

    There is no documented list of default settings, and the defaults sometimes change.  If you really want, install a new box and you can compare the defaults there.

    To my recollection the default for relay_invalid_http_traffic was always off.  There is no way to turn it on in WebAdmin, only using console.  Of course it is also in Backup/Restore and in the XML API.  This controls the behavior that occurs when we detect/expect HTTP but it does not conform to the specification (which means we cannot scan it).

    Thanks for letting me know...  I will update the text in the original post.