Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

XG v18 SSL/TLS inspection interfering with Veeam Cloud Provider Replication

Hey everyone,

we're using Veeam and replicate backups from remote sites to our main site. Since deploying v18, we started having issues with the replication failing. After working with Veeam Support, the solution was to completely disable SSL/TLS inspection on the firewall at the main site. Not sure why it's causing issues, but at this point, I can't turn inspection on because the backup replication will fail. How can this be resolved? We're not even decrypting, and I don't think there's a way to turn off inspection for specific connections.

The issue seems to exist only from one site that is also running XG v18. The other ones on v17.5.9 are fine, even with inspection turned on at the replication target site - very strange.

How to troubleshoot and fix this?

Anyone else having this issue?

Thanks!



This thread was automatically locked due to age.
Parents
  • Hi,

    try turning the on the web proxy and then create exceptions for the site.

    What restrictions do you have in your firewall rule the VEEAM application or traffic?

    Ian

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • Hmm, yeah I guess, but I think my ultimate goal would be to use the new DPI engine (and not have it break things ;) ). The firewall rule just looks at the port number, not the application.

  • Perfect! Please let us know.

    As I said, DPI should not be responsible for this traffic unless it is a bug.

    Regards

  • Hey guys, I opened a case: #9744383

    We've just been testing this with multiple different combinations of settings and nothing but disable SSL/TLS has worked so far. Even when the DNAT firewall rule is set to use the Proxy rather than DPI engine it still doesn't work.

    We upgraded to the latest release of v18 also.

  • Hi,

    please upgrade to the v18 SR2 which will enable you to disable SSL/TLS on your specific rules and hopefully overcome your issue.

    Ian

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • We upgraded to build 339 if that's what you mean, but I don't see an option to disable SSL/TLS inspection for a firewall rule. There's an on/off toggle for the inspection which is global which solves our problem, but that's not the point because I'm disabling the entire feature that's being promoted in V18. Making a SSL/TLS rule with don't decrypt is not applicable here because the WAN zone can't be selected as the source and we're not even decrypting anything either.

    To me it looks like a severe bug that's interfering with traffic that shouldn't be inspected. It's being escalated to GES now after troubleshooting and testing for 4h straight.

  • Hi,

    my apologies, you are correct. I need to go back and review the thread on the subject to find out why I misunderstood the intent of that switch.

    Ian

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • No worries. I'll post an update once I hear more from the escalations team.

  • I have followed up with the response from Michael Dunn in the thread where the feature is questioned as well.

    A bit disappointing because that was an issue I raised doing EAP about not being able to create simple firewall rule as was the case in v17.5.8

    Ian

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • The DPI engine is always looking and inspecting at traffic, even if you don't want it to. It is a very poor implementation IMHO.

    So, you can disable the DPI completely and continue to use the Proxy or wait until Sophos get's a clue and realizes they need to make a change.

    Many issues have come up because of their decisions. Hopefully they change their awful approach to DPI.

    What most don't realize is that XG cannot meet stated performance metrics because of how Sophos chose to implement DPI. I questioned the performance of XG because of the change but, did not get a response. The XG cannot meet the performance metrics they quote in the current V18 config. As an example, Sophos states 16,000 mbps for the XG210. It cannot come close to that on v18 because the DPI engine is looking at all traffic.

    If this is production, I would downgrade back to v17.5 if I were you.

  • MichaelBolton said:

    The DPI engine is always looking and inspecting at traffic, even if you don't want it to. It is a very poor implementation IMHO.

    Actually the "DPI engine" has always looked at all traffic since XG's very beginning.

    What is new in XG v18 is that the DPI engine is doing more than it used to.

     - detecting HTTP traffic on any port

     - detecting TLS on any port

        - enforcing TLS policy on any port

        - potentially decrypting TLS

     - enforcing Web policy on HTTP and HTTPS on any port (not just 80/443)

    When the XG is detecting HTTP and TLS on any port and enforcing TLS policy on any port it is slower.  When it is offloading connections (fastpath), decrypting TLS and enforcing using the DPI engine it is significantly faster.  The speed improvements that are made more than make up for the areas where speed is reduced.  In affect we are doing more (which slows things down) but we are doing it faster (which speeds things up) resulting in a net faster speed.

     

    MichaelBolton said:

    So, you can disable the DPI completely and continue to use the Proxy or wait until Sophos get's a clue and realizes they need to make a change.

    Clue received and GA Soft Release 2 contains a switch that disables TLS enforcement unless the firewall rule contains a web policy.  Please see this post:

    https://community.sophos.com/products/xg-firewall/b/blog/posts/xg-firewall-v18-ga_2d00_build339-is-now-available

    One of the side effects of this is that customers who upgrade (and therefore TLS inspection is Off) or who turn it Off should get similar performance between 17.5 and 18.0.

    Customers who configure their boxes to take advantage of v18.0 DPI mode and fastpath should see much greater performance.

    MichaelBolton said:

    Many issues have come up because of their decisions. Hopefully they change their awful approach to DPI.

    Many issues are fixed in GA, GA SR2, and will be fixed in MR1.  Our approach is to fix issues in DPI, not bypass DPI.

    MichaelBolton said:

    What most don't realize is that XG cannot meet stated performance metrics because of how Sophos chose to implement DPI. I questioned the performance of XG because of the change but, did not get a response. The XG cannot meet the performance metrics they quote in the current V18 config. As an example, Sophos states 16,000 mbps for the XG210. It cannot come close to that on v18 because the DPI engine is looking at all traffic.

    I agree that performance metrics are hard to state because lab test environment don't match real world conditions.  In reality in my personal opinion, published metrics (of any vendor/product) are most useful for comparing two models, not for seeing the real world usage.  I would stand behind whatever the Sophos published metrics are for comparing 17.5 and 18.0 for correctly configured systems.  In other words if Sophos says model xyz does 1,000 foobar in 17.5 and 1,200 foobar in 18.0 I believe that v18.0 will be 20% faster in foobar.

    Whether you believe that the foobar numbers are achievable or not based on your understanding of DPI is immaterial.  Internal and independent testing is proving otherwise.

    MichaelBolton said:

    If this is production, I would downgrade back to v17.5 if I were you.

     
    I would restate.  If you love to play with the latest and greatest of any product, install the latest of any product.  If you need safety and stability in your products, delay before any major upgrades.  That advice goes for any product of any vendor.
     
  • Thanks for the additional insight. I'm in the same boat and think that good improvements have been made and I'm hoping for all these bugs to get ironed out.

    I'll report back after working further with GES and hopefully can determine the root cause of this issue.

Reply Children
  • Bjoern Freiherr said:

    Thanks for the additional insight. I'm in the same boat and think that good improvements have been made and I'm hoping for all these bugs to get ironed out.

    I'll report back after working further with GES and hopefully can determine the root cause of this issue.

    And thank you.  A lot of the issues that are reported in EAP and GA they are applications or systems that we don't have in house.  Or sometimes more confusingly we do have access but we cannot reproduce what customers experience.  We rely on helpful customers to work with Devs (in EAP) or Support (after release) to reproduce the problems with the detailed logs that we need to pinpoint the problem.  As far as I know Veeam issues were reported in EAP but we never got the detailed information we needed.

    I am part of the QA team, and we work hard to find all the bugs before release.  But there are tens of thousands of devices and clients, and millions of servers.  It is impossible for us to test everything, which is why we ran such a large EAP.  You think that the internet runs on standards but the number of bugs we are fixing that only occur in one browser...  :)  We got a lot of things fixed before GA, but with GA comes new reports.  But we need more than just "it doesn't work", we need customers who work with us to show their configuration and their logs so that we get high quality actionable bug reports.  I want to give a big thanks to the customers who are doing that.

    When we do upgrades we try hard not change configuration so customers have the same experience.  Therefore existing 17.5 rules for Web traffic we default to using the same Web proxy and not the DPI engine, meaning customers do not see the speed improvements.  They need to explicitly update their configuration to do so.  But at the same time, DPI engine was doing more analyzing of all traffic (as Michael Bolton correctly points out) which includes a speed decrease.  In the first cut of GA that meant some customers had worse performance.  We rectified that in GA SR2 with the TLS toggle, which disables many of the new DPI features except when needed, making v18 more like v17.5 in both features and performance for customers who upgrade.  The net result is that some of the "XStream" architecture and performance won't be seen on an upgrade until customers change their configuration to take advantage of it.  The toggle may also help bypass the DPI as a workaround to some of the bugs.  As time goes on, we should have fewer customers using this "turn off the new way" toggle.