This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Dropped Connections during Pattern Updates

Since installing multiple XG Firewalls in a multi-site environment, we have been plagued with "random" outages that last between 30-90 seconds.

I have finally correlated this with Pattern updates for either ATP, AV or IPS.  During the time of the definition updates all connectivity to the XG firewall is lost.  This actually brings down our Wide Area network and causes VoIP phones to restart looking for the phone server.

I have an open support ticket with Sophos but I'm awaiting their response.

I have changed the updates to happen less frequently (Daily), however when there are updates it still brings down the connection (albeit less often now).

Is there a way to still have automatic updates turned on but do them on a time schedule?  I find it utterly ridiculous that the system cannot do pattern updates without bringing down the entire network.

If this is "expected" behavior what have others done as workarounds?  I cannot have 30-90 seconds of downtime every other day for pattern updates. 



This thread was automatically locked due to age.
Parents
  • Thanks Bill.  I agree and have seen this article as well.

    But there is currently no fix and no workaround other than to turn off automatic pattern updates?  How can we have a firewall device that drops all connections during pattern updates?  How can I recommend to enterprise?  How do I get more visibility to this?  I've also seen the Sophos Idea to give more control over scheduling these updates which I have upvoted, but frankly, I don't want to lose connection, EVER.

    I'm awaiting Sophos support to get back to me on my questions above as well, but I just can't fathom how this is acceptable on any level.

    I feel like now I am forced to choose between consistent connectivity by turning off automatic pattern updates and security.

  • Can you try to disable virtual fastpath? 

    console> system firewall-acceleration disable

    __________________________________________________________________________________________________________________

  • I just got off the phone with support and they suggested the same.  I am told this has Bug ID: NC-70896

    I will test this and report back.

  • I have applied this to each of the firewalls experiencing the issue (8 of them).  I'm hopeful this resolves it.

  • Any feedback? 

    __________________________________________________________________________________________________________________

  • LuCar Toni said:

    Any feedback? 

    No.

    https://community.sophos.com/sophos-xg-firewall/f/discussions/123652/internet-traffic-stops-every-time-xg-has-an-ips-or-atp-update

    https://community.sophos.com/sophos-xg-firewall/f/discussions/122951/connection-drops-during-av-pattern-updates

    I have spent days on this issue. Have had it escalated beyond tier two technical support to senior management and eight months later this is still an issue with no solution or even a satisfactory workaround (scheduled updates).

    Why would I waste more time on this? I don't believe there is any issue reproducing this problem so rather than asking your users to do Sophos's job, isn't it about time that Sophos's development team got their finger out, tried your suggestion themselves and fixed the problem?

  • Interesting. I heard no negativ feedback anymore after this switch was disabled. It is about getting the intel, if the issue disappear without the fastpath enabled. Because we can actually investigate this issue in more depth, if we know the module causing this. 

    And if you suffer of this issue and there is a viable workaround, why not using it? 

    __________________________________________________________________________________________________________________

  • Can you tell me why you have to get intel from your users when Sophos can just test this themselves? This has been a serious problem for at least 9 months and I would expect Sophos to be doing everything they can to resolve it themselves.

    This is what Sophos say themselves about Fastpath - "FastPath packet optimization dramatically improves firewall throughput performance by automatically putting trusted and secure packets on the fast path". So why would I want to cripple my XG performance by disabling it?

    I already have a workaround that I have posted here in the forums. If you set updates to every 24 hours and then reboot the XG outside work hours, the updates take place 24 hours after the reboot (and every 24 hours after that). At least you can then avoid the updates happening during working hours and dropping all your internet connections/VOIP sessions when they happen. It's a bit of a fudge because if you have to restart the XG any time during the day, you have to remember to restart it again out of work hours or the updates keep taking place during the day. It also means you can't get updates ASAP but only once every twenty four hours. What would be much better is if Sophos fixed this.

  • I have not had the issue again since running: system firewall-acceleration disable

    I too am concerned I'm missing out on some performance gains by disabling this, but right now it is worth it to me.  I'm hopeful a real fix comes soon.

  • You miss the point. As this issue seems to be not impacting all customers, there only a portion of customer affecting by this. Therefore the installation base affecting of this issue seems to be smaller. Its not the question, if the virtual fast path is causing this issue and if so, on which appliances and in which situations and why. DEV is still looking into this issue and tries to A. Replicate this issue and B. find the reason for this in the first place. 

    While DEV is working on this solution, it is also currently under development to revamp the process of ATP/IPS Pattern process.

    In UTM there was a "easy workaround" for this: 

    Restart policy: Select the policy for connection handling when an IPS engine restart is required, for example when the engine is updated.

    Drop (default): All incoming and outgoing connections will be dropped during engine restart.
    Bypass: All incoming and outgoing connections will bypass IPS scanning while the engine is restarting.

    The point is: customer against the security concerns moved to "bypass", which is actually a bad practice. You could easily say "Why not implement a bypass option in SFOS?". But from a security perspective, there are other approaches to begin with. I would not implement nor enable such a feature in SFOS. 

    As you currently see: This issue was not there before a release. Somehow the virtual fast path seems to have a issue with the reload of the engine and dropping the session in certain edge cases, which still needs to be validated.  

    __________________________________________________________________________________________________________________

  • This is crazy. It's totally unacceptable for a firewall to drop all internet traffic for 30 seconds every day whilst it updates its patterns. Especially now when people are so much more reliant on video conferencing and online meetings. The fact that this issue has been flagged for this long and not treated as a serious problem that requires a critical fix is a real concern.

Reply
  • This is crazy. It's totally unacceptable for a firewall to drop all internet traffic for 30 seconds every day whilst it updates its patterns. Especially now when people are so much more reliant on video conferencing and online meetings. The fact that this issue has been flagged for this long and not treated as a serious problem that requires a critical fix is a real concern.

Children
  • Again, this is not normal. It seems not to affect everybody. Therefore it needs to be investigated in more depth. And to know the feedback, affected customers with this workaround are resolved, is a good indicator.

    __________________________________________________________________________________________________________________

  • Is it possible that this is more noticeable on small CPU appliances? So that installing the patterns requires more time with slow CPU speeds?

  • We tested this issue on 3 devices, all of which consistently exhibited the problem. Two 100 series devices dropped the traffic for about two minutes, a 400 series device dropped it for about four seconds.

  • So it is still existing even after disable the fastpath. Will forward this to this feedback to the ID. 

    __________________________________________________________________________________________________________________

  • No...this was in response to LHerzog and was based on our previous testing. I have not yet to tested again with fastpath disabled.

  • Would be nice to know, if this actually addresses your issue or not. Sophos is going to address this in V19.0 anyway, as the interaction with the restart of engines will be addressed. 

    __________________________________________________________________________________________________________________

  • Yes, I am seeing this behaviour on firewalls that have fastpath disabled (which seems to be the default for XG's that are in a cluster).

    It would make more sense if this was the other way around, surely? if fastpath routes 'trusted' traffic directly without IPS checking it, it shouldn't be affected by the IPS service restarting? Where as if fastpath is disabled, and traffic cannot be checked as IPS was restarting, then the traffic would be dropped?

  • Virtual Fastpath is a component, which uses Snort as well. Therefore if Snort uses a update, it could drop the session as well, but certainly not in each and every case. 

    VFP is per default enabled on all appliances (And HA). But was disabled pre V18.0 MR4. It will not get enabled after an upgrade, instead you can change your config and enable it. 

    __________________________________________________________________________________________________________________

  • some interesting facts are coming up here. Any reason for the default disabled VFP setting in MR4? Is this only for fresh installations on MR4? What is this with migrations over MR4 to MR5. We went from 17.5 MR12 over 18 MR1,->4,->5 where we re-imaged our appliances when going to MR4, then imported the config.

    VFP was enabled when checking it recently but has now been disabled because asked by support for some kind of issue without fxing the issue by the disabled setting.

     can you provide some steps how you measured the time of connection loss?

    I'd like to review this with our XG430s HA.

    I know we lost traffic for some seconds when disabling VFP.

  • Sophos is not enabling most settings after a firmware upgrade to avoid issues within the network after a firmware update. V18.0 MR4 enabled VFP option on HAs. Customers coming from a older version, had this disabled and can enable it, if they want. This option will be likely be enabled with a future release. 

    A new installation without backup/restore will have VFP enabled per default. 

    __________________________________________________________________________________________________________________