Dropped Connections during Pattern Updates

Since installing multiple XG Firewalls in a multi-site environment, we have been plagued with "random" outages that last between 30-90 seconds.

I have finally correlated this with Pattern updates for either ATP, AV or IPS.  During the time of the definition updates all connectivity to the XG firewall is lost.  This actually brings down our Wide Area network and causes VoIP phones to restart looking for the phone server.

I have an open support ticket with Sophos but I'm awaiting their response.

I have changed the updates to happen less frequently (Daily), however when there are updates it still brings down the connection (albeit less often now).

Is there a way to still have automatic updates turned on but do them on a time schedule?  I find it utterly ridiculous that the system cannot do pattern updates without bringing down the entire network.

If this is "expected" behavior what have others done as workarounds?  I cannot have 30-90 seconds of downtime every other day for pattern updates. 



Added TAGs
[edited by: emmosophos at 9:10 PM (GMT -7) on 28 Jun 2021]
Parents Reply Children
  • Any feedback? 

    __________________________________________________________________________________________________________________

  • LuCar Toni said:

    Any feedback? 

    No.

    https://community.sophos.com/sophos-xg-firewall/f/discussions/123652/internet-traffic-stops-every-time-xg-has-an-ips-or-atp-update

    https://community.sophos.com/sophos-xg-firewall/f/discussions/122951/connection-drops-during-av-pattern-updates

    I have spent days on this issue. Have had it escalated beyond tier two technical support to senior management and eight months later this is still an issue with no solution or even a satisfactory workaround (scheduled updates).

    Why would I waste more time on this? I don't believe there is any issue reproducing this problem so rather than asking your users to do Sophos's job, isn't it about time that Sophos's development team got their finger out, tried your suggestion themselves and fixed the problem?

  • Interesting. I heard no negativ feedback anymore after this switch was disabled. It is about getting the intel, if the issue disappear without the fastpath enabled. Because we can actually investigate this issue in more depth, if we know the module causing this. 

    And if you suffer of this issue and there is a viable workaround, why not using it? 

    __________________________________________________________________________________________________________________

  • Can you tell me why you have to get intel from your users when Sophos can just test this themselves? This has been a serious problem for at least 9 months and I would expect Sophos to be doing everything they can to resolve it themselves.

    This is what Sophos say themselves about Fastpath - "FastPath packet optimization dramatically improves firewall throughput performance by automatically putting trusted and secure packets on the fast path". So why would I want to cripple my XG performance by disabling it?

    I already have a workaround that I have posted here in the forums. If you set updates to every 24 hours and then reboot the XG outside work hours, the updates take place 24 hours after the reboot (and every 24 hours after that). At least you can then avoid the updates happening during working hours and dropping all your internet connections/VOIP sessions when they happen. It's a bit of a fudge because if you have to restart the XG any time during the day, you have to remember to restart it again out of work hours or the updates keep taking place during the day. It also means you can't get updates ASAP but only once every twenty four hours. What would be much better is if Sophos fixed this.

  • I have not had the issue again since running: system firewall-acceleration disable

    I too am concerned I'm missing out on some performance gains by disabling this, but right now it is worth it to me.  I'm hopeful a real fix comes soon.

  • You miss the point. As this issue seems to be not impacting all customers, there only a portion of customer affecting by this. Therefore the installation base affecting of this issue seems to be smaller. Its not the question, if the virtual fast path is causing this issue and if so, on which appliances and in which situations and why. DEV is still looking into this issue and tries to A. Replicate this issue and B. find the reason for this in the first place. 

    While DEV is working on this solution, it is also currently under development to revamp the process of ATP/IPS Pattern process.

    In UTM there was a "easy workaround" for this: 

    Restart policy: Select the policy for connection handling when an IPS engine restart is required, for example when the engine is updated.

    Drop (default): All incoming and outgoing connections will be dropped during engine restart.
    Bypass: All incoming and outgoing connections will bypass IPS scanning while the engine is restarting.

    The point is: customer against the security concerns moved to "bypass", which is actually a bad practice. You could easily say "Why not implement a bypass option in SFOS?". But from a security perspective, there are other approaches to begin with. I would not implement nor enable such a feature in SFOS. 

    As you currently see: This issue was not there before a release. Somehow the virtual fast path seems to have a issue with the reload of the engine and dropping the session in certain edge cases, which still needs to be validated.  

    __________________________________________________________________________________________________________________

  • This is crazy. It's totally unacceptable for a firewall to drop all internet traffic for 30 seconds every day whilst it updates its patterns. Especially now when people are so much more reliant on video conferencing and online meetings. The fact that this issue has been flagged for this long and not treated as a serious problem that requires a critical fix is a real concern.

  • Again, this is not normal. It seems not to affect everybody. Therefore it needs to be investigated in more depth. And to know the feedback, affected customers with this workaround are resolved, is a good indicator.

    __________________________________________________________________________________________________________________

  • Is it possible that this is more noticeable on small CPU appliances? So that installing the patterns requires more time with slow CPU speeds?

  • I am only "missing the point" because I have been consistently told by Sophos that dropping connections during pattern updates was "expected behaviour". This was the reason given for closing my case after long discussions between senior support staff and the dev team. To quote the L2 Senior Escalations Engineer I was working with, "Having escalated the case internally, the behaviour you have been seeing is expected and there is no current workaround other than what you are currently using."

    I spent many, many hours on this case and now you are telling me, 8 months later, that what the dev team said at the time just isn't true. Are you surprised that I'm pretty angry about this? Are you surprised that, once again, my confidence in Sophos products has been undermined when your own dev team don't seem to understand their own product?