Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

XG v18 SSL/TLS inspection interfering with Veeam Cloud Provider Replication

Hey everyone,

we're using Veeam and replicate backups from remote sites to our main site. Since deploying v18, we started having issues with the replication failing. After working with Veeam Support, the solution was to completely disable SSL/TLS inspection on the firewall at the main site. Not sure why it's causing issues, but at this point, I can't turn inspection on because the backup replication will fail. How can this be resolved? We're not even decrypting, and I don't think there's a way to turn off inspection for specific connections.

The issue seems to exist only from one site that is also running XG v18. The other ones on v17.5.9 are fine, even with inspection turned on at the replication target site - very strange.

How to troubleshoot and fix this?

Anyone else having this issue?

Thanks!



This thread was automatically locked due to age.
Parents Reply Children
  • We upgraded to build 339 if that's what you mean, but I don't see an option to disable SSL/TLS inspection for a firewall rule. There's an on/off toggle for the inspection which is global which solves our problem, but that's not the point because I'm disabling the entire feature that's being promoted in V18. Making a SSL/TLS rule with don't decrypt is not applicable here because the WAN zone can't be selected as the source and we're not even decrypting anything either.

    To me it looks like a severe bug that's interfering with traffic that shouldn't be inspected. It's being escalated to GES now after troubleshooting and testing for 4h straight.

  • Hi,

    my apologies, you are correct. I need to go back and review the thread on the subject to find out why I misunderstood the intent of that switch.

    Ian

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • No worries. I'll post an update once I hear more from the escalations team.

  • I have followed up with the response from Michael Dunn in the thread where the feature is questioned as well.

    A bit disappointing because that was an issue I raised doing EAP about not being able to create simple firewall rule as was the case in v17.5.8

    Ian

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • The DPI engine is always looking and inspecting at traffic, even if you don't want it to. It is a very poor implementation IMHO.

    So, you can disable the DPI completely and continue to use the Proxy or wait until Sophos get's a clue and realizes they need to make a change.

    Many issues have come up because of their decisions. Hopefully they change their awful approach to DPI.

    What most don't realize is that XG cannot meet stated performance metrics because of how Sophos chose to implement DPI. I questioned the performance of XG because of the change but, did not get a response. The XG cannot meet the performance metrics they quote in the current V18 config. As an example, Sophos states 16,000 mbps for the XG210. It cannot come close to that on v18 because the DPI engine is looking at all traffic.

    If this is production, I would downgrade back to v17.5 if I were you.

  • MichaelBolton said:

    The DPI engine is always looking and inspecting at traffic, even if you don't want it to. It is a very poor implementation IMHO.

    Actually the "DPI engine" has always looked at all traffic since XG's very beginning.

    What is new in XG v18 is that the DPI engine is doing more than it used to.

     - detecting HTTP traffic on any port

     - detecting TLS on any port

        - enforcing TLS policy on any port

        - potentially decrypting TLS

     - enforcing Web policy on HTTP and HTTPS on any port (not just 80/443)

    When the XG is detecting HTTP and TLS on any port and enforcing TLS policy on any port it is slower.  When it is offloading connections (fastpath), decrypting TLS and enforcing using the DPI engine it is significantly faster.  The speed improvements that are made more than make up for the areas where speed is reduced.  In affect we are doing more (which slows things down) but we are doing it faster (which speeds things up) resulting in a net faster speed.

     

    MichaelBolton said:

    So, you can disable the DPI completely and continue to use the Proxy or wait until Sophos get's a clue and realizes they need to make a change.

    Clue received and GA Soft Release 2 contains a switch that disables TLS enforcement unless the firewall rule contains a web policy.  Please see this post:

    https://community.sophos.com/products/xg-firewall/b/blog/posts/xg-firewall-v18-ga_2d00_build339-is-now-available

    One of the side effects of this is that customers who upgrade (and therefore TLS inspection is Off) or who turn it Off should get similar performance between 17.5 and 18.0.

    Customers who configure their boxes to take advantage of v18.0 DPI mode and fastpath should see much greater performance.

    MichaelBolton said:

    Many issues have come up because of their decisions. Hopefully they change their awful approach to DPI.

    Many issues are fixed in GA, GA SR2, and will be fixed in MR1.  Our approach is to fix issues in DPI, not bypass DPI.

    MichaelBolton said:

    What most don't realize is that XG cannot meet stated performance metrics because of how Sophos chose to implement DPI. I questioned the performance of XG because of the change but, did not get a response. The XG cannot meet the performance metrics they quote in the current V18 config. As an example, Sophos states 16,000 mbps for the XG210. It cannot come close to that on v18 because the DPI engine is looking at all traffic.

    I agree that performance metrics are hard to state because lab test environment don't match real world conditions.  In reality in my personal opinion, published metrics (of any vendor/product) are most useful for comparing two models, not for seeing the real world usage.  I would stand behind whatever the Sophos published metrics are for comparing 17.5 and 18.0 for correctly configured systems.  In other words if Sophos says model xyz does 1,000 foobar in 17.5 and 1,200 foobar in 18.0 I believe that v18.0 will be 20% faster in foobar.

    Whether you believe that the foobar numbers are achievable or not based on your understanding of DPI is immaterial.  Internal and independent testing is proving otherwise.

    MichaelBolton said:

    If this is production, I would downgrade back to v17.5 if I were you.

     
    I would restate.  If you love to play with the latest and greatest of any product, install the latest of any product.  If you need safety and stability in your products, delay before any major upgrades.  That advice goes for any product of any vendor.
     
  • Thanks for the additional insight. I'm in the same boat and think that good improvements have been made and I'm hoping for all these bugs to get ironed out.

    I'll report back after working further with GES and hopefully can determine the root cause of this issue.

  • Bjoern Freiherr said:

    Thanks for the additional insight. I'm in the same boat and think that good improvements have been made and I'm hoping for all these bugs to get ironed out.

    I'll report back after working further with GES and hopefully can determine the root cause of this issue.

    And thank you.  A lot of the issues that are reported in EAP and GA they are applications or systems that we don't have in house.  Or sometimes more confusingly we do have access but we cannot reproduce what customers experience.  We rely on helpful customers to work with Devs (in EAP) or Support (after release) to reproduce the problems with the detailed logs that we need to pinpoint the problem.  As far as I know Veeam issues were reported in EAP but we never got the detailed information we needed.

    I am part of the QA team, and we work hard to find all the bugs before release.  But there are tens of thousands of devices and clients, and millions of servers.  It is impossible for us to test everything, which is why we ran such a large EAP.  You think that the internet runs on standards but the number of bugs we are fixing that only occur in one browser...  :)  We got a lot of things fixed before GA, but with GA comes new reports.  But we need more than just "it doesn't work", we need customers who work with us to show their configuration and their logs so that we get high quality actionable bug reports.  I want to give a big thanks to the customers who are doing that.

    When we do upgrades we try hard not change configuration so customers have the same experience.  Therefore existing 17.5 rules for Web traffic we default to using the same Web proxy and not the DPI engine, meaning customers do not see the speed improvements.  They need to explicitly update their configuration to do so.  But at the same time, DPI engine was doing more analyzing of all traffic (as Michael Bolton correctly points out) which includes a speed decrease.  In the first cut of GA that meant some customers had worse performance.  We rectified that in GA SR2 with the TLS toggle, which disables many of the new DPI features except when needed, making v18 more like v17.5 in both features and performance for customers who upgrade.  The net result is that some of the "XStream" architecture and performance won't be seen on an upgrade until customers change their configuration to take advantage of it.  The toggle may also help bypass the DPI as a workaround to some of the bugs.  As time goes on, we should have fewer customers using this "turn off the new way" toggle.

  •  I appreciate you responding. I would like to start by saying, please don't feel like I was directing any of this towards you, I am just stating my opinions based on what other Sophos staff members were saying and my many, many dealings with support staff. I know you are part of the QA team but there is no way you can look at all issues, so please don't feel like any of this is towards you or your team.  Alot of partners feel the same way as well but can't post it. We are not a partner. We are a paying customer with lots of customers that we do business with that use Sophos so I'm going to post the real world truth, since I am not worried, like others if Sophos has an issue with comments that a partner puts here. That being said, I want to respond to some of your responses.

    Michael Dunn said:

     

     

    MichaelBolton

    The DPI engine is always looking and inspecting at traffic, even if you don't want it to. It is a very poor implementation IMHO.

     

     

    Actually the "DPI engine" has always looked at all traffic since XG's very beginning.

    What is new in XG v18 is that the DPI engine is doing more than it used to.

     - detecting HTTP traffic on any port

     - detecting TLS on any port

        - enforcing TLS policy on any port

        - potentially decrypting TLS

     - enforcing Web policy on HTTP and HTTPS on any port (not just 80/443)

    When the XG is detecting HTTP and TLS on any port and enforcing TLS policy on any port it is slower.  When it is offloading connections (fastpath), decrypting TLS and enforcing using the DPI engine it is significantly faster.  The speed improvements that are made more than make up for the areas where speed is reduced.  In affect we are doing more (which slows things down) but we are doing it faster (which speeds things up) resulting in a net faster speed.

     

     

     ------Yes, I understand this. My point was V18 is inspecting more. Intervlan traffic doesn't always need to be inspected. As you say, fastpath is engaged after some inspection. How about allowing a rule with no inspection at all? I don't need Snort trying to look at encrypted SMB traffic between servers in different VLANs.

     

     
    MichaelBolton

    So, you can disable the DPI completely and continue to use the Proxy or wait until Sophos get's a clue and realizes they need to make a change.

     

     

    Clue received and GA Soft Release 2 contains a switch that disables TLS enforcement unless the firewall rule contains a web policy.  Please see this post:

    https://community.sophos.com/products/xg-firewall/b/blog/posts/xg-firewall-v18-ga_2d00_build339-is-now-available

    One of the side effects of this is that customers who upgrade (and therefore TLS inspection is Off) or who turn it Off should get similar performance between 17.5 and 18.0.

    Customers who configure their boxes to take advantage of v18.0 DPI mode and fastpath should see much greater performance.

     

     

    -------Thank you! This should have been how it was rolled out. Some staff members were basically stating that Sophos didn't won't to do this but I am guessing the issues outweighed the plan.

     

     
    MichaelBolton

    Many issues have come up because of their decisions. Hopefully they change their awful approach to DPI.

     

     

    Many issues are fixed in GA, GA SR2, and will be fixed in MR1.  Our approach is to fix issues in DPI, not bypass DPI.

     

     
    MichaelBolton

    What most don't realize is that XG cannot meet stated performance metrics because of how Sophos chose to implement DPI. I questioned the performance of XG because of the change but, did not get a response. The XG cannot meet the performance metrics they quote in the current V18 config. As an example, Sophos states 16,000 mbps for the XG210. It cannot come close to that on v18 because the DPI engine is looking at all traffic.

     

     

    I agree that performance metrics are hard to state because lab test environment don't match real world conditions.  In reality in my personal opinion, published metrics (of any vendor/product) are most useful for comparing two models, not for seeing the real world usage.  I would stand behind whatever the Sophos published metrics are for comparing 17.5 and 18.0 for correctly configured systems.  In other words if Sophos says model xyz does 1,000 foobar in 17.5 and 1,200 foobar in 18.0 I believe that v18.0 will be 20% faster in foobar.

    Whether you believe that the foobar numbers are achievable or not based on your understanding of DPI is immaterial.  Internal and independent testing is proving otherwise.

     

    --------So I'm testing using 2 servers plugged into a single SG230 with 10G interfaces. V17.5 is much faster than V18 for intervlan taffic. V18 cannot saturate the links, even though it is rated to do so. This issue goes back much farther though. Back in 2018, I questioned performance from a 3rd party to  and did not get a response. Here is the post. https://community.sophos.com/products/xg-firewall/f/firewall-and-policies/104195/high-amount-of-evasions----are-they-fixed. This isn't just me measuring, this is NSS. "The device is rated by NSS at 5,844 Mbps, which is lower than the vendor claimed performance; Sophos rates this device at 11,800 Mbps." So, the performance has been an ongoing issue and hasn't improved, and DPI just adds more performance drawbacks until fastpath offloads it. I'll point out other vendors performed as claimed.

     

     
    MichaelBolton

    If this is production, I would downgrade back to v17.5 if I were you.

     

     

     
    I would restate.  If you love to play with the latest and greatest of any product, install the latest of any product.  If you need safety and stability in your products, delay before any major upgrades.  That advice goes for any product of any vendor.
     
     
    -------Completely agree. V18 shows fantastic potential, but it isn't there yet and that is true with any vendor, as you stated. Each customer must weigh their choice. In this case, most customers that use Veeam are worried about up-time and maintaining DR environments which is why I said I would downgrade.