Dropped Connections during Pattern Updates

Since installing multiple XG Firewalls in a multi-site environment, we have been plagued with "random" outages that last between 30-90 seconds.

I have finally correlated this with Pattern updates for either ATP, AV or IPS.  During the time of the definition updates all connectivity to the XG firewall is lost.  This actually brings down our Wide Area network and causes VoIP phones to restart looking for the phone server.

I have an open support ticket with Sophos but I'm awaiting their response.

I have changed the updates to happen less frequently (Daily), however when there are updates it still brings down the connection (albeit less often now).

Is there a way to still have automatic updates turned on but do them on a time schedule?  I find it utterly ridiculous that the system cannot do pattern updates without bringing down the entire network.

If this is "expected" behavior what have others done as workarounds?  I cannot have 30-90 seconds of downtime every other day for pattern updates. 



Added TAGs
[edited by: emmosophos at 9:10 PM (GMT -7) on 28 Jun 2021]
Parents Reply
  • Interesting. I heard no negativ feedback anymore after this switch was disabled. It is about getting the intel, if the issue disappear without the fastpath enabled. Because we can actually investigate this issue in more depth, if we know the module causing this. 

    And if you suffer of this issue and there is a viable workaround, why not using it? 

    __________________________________________________________________________________________________________________

Children
  • Can you tell me why you have to get intel from your users when Sophos can just test this themselves? This has been a serious problem for at least 9 months and I would expect Sophos to be doing everything they can to resolve it themselves.

    This is what Sophos say themselves about Fastpath - "FastPath packet optimization dramatically improves firewall throughput performance by automatically putting trusted and secure packets on the fast path". So why would I want to cripple my XG performance by disabling it?

    I already have a workaround that I have posted here in the forums. If you set updates to every 24 hours and then reboot the XG outside work hours, the updates take place 24 hours after the reboot (and every 24 hours after that). At least you can then avoid the updates happening during working hours and dropping all your internet connections/VOIP sessions when they happen. It's a bit of a fudge because if you have to restart the XG any time during the day, you have to remember to restart it again out of work hours or the updates keep taking place during the day. It also means you can't get updates ASAP but only once every twenty four hours. What would be much better is if Sophos fixed this.

  • You miss the point. As this issue seems to be not impacting all customers, there only a portion of customer affecting by this. Therefore the installation base affecting of this issue seems to be smaller. Its not the question, if the virtual fast path is causing this issue and if so, on which appliances and in which situations and why. DEV is still looking into this issue and tries to A. Replicate this issue and B. find the reason for this in the first place. 

    While DEV is working on this solution, it is also currently under development to revamp the process of ATP/IPS Pattern process.

    In UTM there was a "easy workaround" for this: 

    Restart policy: Select the policy for connection handling when an IPS engine restart is required, for example when the engine is updated.

    Drop (default): All incoming and outgoing connections will be dropped during engine restart.
    Bypass: All incoming and outgoing connections will bypass IPS scanning while the engine is restarting.

    The point is: customer against the security concerns moved to "bypass", which is actually a bad practice. You could easily say "Why not implement a bypass option in SFOS?". But from a security perspective, there are other approaches to begin with. I would not implement nor enable such a feature in SFOS. 

    As you currently see: This issue was not there before a release. Somehow the virtual fast path seems to have a issue with the reload of the engine and dropping the session in certain edge cases, which still needs to be validated.  

    __________________________________________________________________________________________________________________

  • This is crazy. It's totally unacceptable for a firewall to drop all internet traffic for 30 seconds every day whilst it updates its patterns. Especially now when people are so much more reliant on video conferencing and online meetings. The fact that this issue has been flagged for this long and not treated as a serious problem that requires a critical fix is a real concern.

  • Again, this is not normal. It seems not to affect everybody. Therefore it needs to be investigated in more depth. And to know the feedback, affected customers with this workaround are resolved, is a good indicator.

    __________________________________________________________________________________________________________________

  • Is it possible that this is more noticeable on small CPU appliances? So that installing the patterns requires more time with slow CPU speeds?

  • I am only "missing the point" because I have been consistently told by Sophos that dropping connections during pattern updates was "expected behaviour". This was the reason given for closing my case after long discussions between senior support staff and the dev team. To quote the L2 Senior Escalations Engineer I was working with, "Having escalated the case internally, the behaviour you have been seeing is expected and there is no current workaround other than what you are currently using."

    I spent many, many hours on this case and now you are telling me, 8 months later, that what the dev team said at the time just isn't true. Are you surprised that I'm pretty angry about this? Are you surprised that, once again, my confidence in Sophos products has been undermined when your own dev team don't seem to understand their own product?

  • We tested this issue on 3 devices, all of which consistently exhibited the problem. Two 100 series devices dropped the traffic for about two minutes, a 400 series device dropped it for about four seconds.

  • So it is still existing even after disable the fastpath. Will forward this to this feedback to the ID. 

    __________________________________________________________________________________________________________________

  • No...this was in response to LHerzog and was based on our previous testing. I have not yet to tested again with fastpath disabled.

  • Would be nice to know, if this actually addresses your issue or not. Sophos is going to address this in V19.0 anyway, as the interaction with the restart of engines will be addressed. 

    __________________________________________________________________________________________________________________