This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Dropped Connections during Pattern Updates

Since installing multiple XG Firewalls in a multi-site environment, we have been plagued with "random" outages that last between 30-90 seconds.

I have finally correlated this with Pattern updates for either ATP, AV or IPS.  During the time of the definition updates all connectivity to the XG firewall is lost.  This actually brings down our Wide Area network and causes VoIP phones to restart looking for the phone server.

I have an open support ticket with Sophos but I'm awaiting their response.

I have changed the updates to happen less frequently (Daily), however when there are updates it still brings down the connection (albeit less often now).

Is there a way to still have automatic updates turned on but do them on a time schedule?  I find it utterly ridiculous that the system cannot do pattern updates without bringing down the entire network.

If this is "expected" behavior what have others done as workarounds?  I cannot have 30-90 seconds of downtime every other day for pattern updates. 



This thread was automatically locked due to age.
Parents
  • Thanks Bill.  I agree and have seen this article as well.

    But there is currently no fix and no workaround other than to turn off automatic pattern updates?  How can we have a firewall device that drops all connections during pattern updates?  How can I recommend to enterprise?  How do I get more visibility to this?  I've also seen the Sophos Idea to give more control over scheduling these updates which I have upvoted, but frankly, I don't want to lose connection, EVER.

    I'm awaiting Sophos support to get back to me on my questions above as well, but I just can't fathom how this is acceptable on any level.

    I feel like now I am forced to choose between consistent connectivity by turning off automatic pattern updates and security.

  • This is crazy. It's totally unacceptable for a firewall to drop all internet traffic for 30 seconds every day whilst it updates its patterns. Especially now when people are so much more reliant on video conferencing and online meetings. The fact that this issue has been flagged for this long and not treated as a serious problem that requires a critical fix is a real concern.

  • Again, this is not normal. It seems not to affect everybody. Therefore it needs to be investigated in more depth. And to know the feedback, affected customers with this workaround are resolved, is a good indicator.

    __________________________________________________________________________________________________________________

  • Is it possible that this is more noticeable on small CPU appliances? So that installing the patterns requires more time with slow CPU speeds?

  • I am only "missing the point" because I have been consistently told by Sophos that dropping connections during pattern updates was "expected behaviour". This was the reason given for closing my case after long discussions between senior support staff and the dev team. To quote the L2 Senior Escalations Engineer I was working with, "Having escalated the case internally, the behaviour you have been seeing is expected and there is no current workaround other than what you are currently using."

    I spent many, many hours on this case and now you are telling me, 8 months later, that what the dev team said at the time just isn't true. Are you surprised that I'm pretty angry about this? Are you surprised that, once again, my confidence in Sophos products has been undermined when your own dev team don't seem to understand their own product?

  • We tested this issue on 3 devices, all of which consistently exhibited the problem. Two 100 series devices dropped the traffic for about two minutes, a 400 series device dropped it for about four seconds.

  • So it is still existing even after disable the fastpath. Will forward this to this feedback to the ID. 

    __________________________________________________________________________________________________________________

  • No...this was in response to LHerzog and was based on our previous testing. I have not yet to tested again with fastpath disabled.

  • Would be nice to know, if this actually addresses your issue or not. Sophos is going to address this in V19.0 anyway, as the interaction with the restart of engines will be addressed. 

    __________________________________________________________________________________________________________________

  • Yes, I am seeing this behaviour on firewalls that have fastpath disabled (which seems to be the default for XG's that are in a cluster).

    It would make more sense if this was the other way around, surely? if fastpath routes 'trusted' traffic directly without IPS checking it, it shouldn't be affected by the IPS service restarting? Where as if fastpath is disabled, and traffic cannot be checked as IPS was restarting, then the traffic would be dropped?

  • Virtual Fastpath is a component, which uses Snort as well. Therefore if Snort uses a update, it could drop the session as well, but certainly not in each and every case. 

    VFP is per default enabled on all appliances (And HA). But was disabled pre V18.0 MR4. It will not get enabled after an upgrade, instead you can change your config and enable it. 

    __________________________________________________________________________________________________________________

Reply
  • Virtual Fastpath is a component, which uses Snort as well. Therefore if Snort uses a update, it could drop the session as well, but certainly not in each and every case. 

    VFP is per default enabled on all appliances (And HA). But was disabled pre V18.0 MR4. It will not get enabled after an upgrade, instead you can change your config and enable it. 

    __________________________________________________________________________________________________________________

Children
  • some interesting facts are coming up here. Any reason for the default disabled VFP setting in MR4? Is this only for fresh installations on MR4? What is this with migrations over MR4 to MR5. We went from 17.5 MR12 over 18 MR1,->4,->5 where we re-imaged our appliances when going to MR4, then imported the config.

    VFP was enabled when checking it recently but has now been disabled because asked by support for some kind of issue without fxing the issue by the disabled setting.

     can you provide some steps how you measured the time of connection loss?

    I'd like to review this with our XG430s HA.

    I know we lost traffic for some seconds when disabling VFP.

  • Sophos is not enabling most settings after a firmware upgrade to avoid issues within the network after a firmware update. V18.0 MR4 enabled VFP option on HAs. Customers coming from a older version, had this disabled and can enable it, if they want. This option will be likely be enabled with a future release. 

    A new installation without backup/restore will have VFP enabled per default. 

    __________________________________________________________________________________________________________________

  • Is there a command to show if it is enabled (rather than enable/disable it)?

  • console> system firewall-acceleration show
    Firewall Acceleration is Enabled in Configuration.

    __________________________________________________________________________________________________________________

  • Just from seeing the issue a few times, we'd typically notice that a MS Teams call would stop responding, then i'd try a web browser and see that it was a generic 'page cannot be displayed'. Give it around 20 seconds and then it works again as expected. But normally the delay is long enough that you'll get dropped from your Teams call and need to dial back in again. Super frustrating.

    I may try the workaround by and reboot our firewall in the early hours, hoping that the pattern updates will take place 24 hours again after that (e.g. out of hours).

  • We use a program called PingPlotter. We run pings to several external addresses (to avoid false positives) and the XG IP as well. It maintains logs of all the connections and we can check those to see at the time of an update, the ping to the XG is fine but all the other pings are blocked.

    You can use PingPlotter free but if you want to run it as a service, you can either run a 14 day trial or buy it.

  • PS: Keep in mind, Ping is not a TCP/UDP connection. There is another Bug ID related to Pings in virtual fastpath, as ICMP seems to behavior differently. As there are no real indication of session in ICMP, it cannot remain the session. Therefore if the ping packet is lost, its lost. TCP/UDP can work with retransmission and there pickup the same session. So a Ping lost does not have to result into a lost session within the network. 

    __________________________________________________________________________________________________________________

  • thanks for your replies. seems hard enough to even create a valid test scenario...

  • I appreciate that ping isn't the same as TCP/UDP but it was a useful tool to get some insight into why users were complaining of lost internet connections. Should have known that Sophos would have a separate bug for ICMP in vitual fastpath!

    The one thing I can confirm, on the two sites I have just tested it, pings don't stop when there is an update and Firewall Acceleration is disabled.

  • From a network perspective, ping is always a bad tool to troubleshoot further more than "is a connection even possible?". Because looking at Ping(ICMP) is its like looking at a street with jammed traffic. Using ICMP could mean, you use a motorcycle going through the traffic and still reaching the destination, but your "real traffic (cars) cannot do this. It simply does not reflect in some cases the real world. I saw a lot of administrators struggling with this especially in the movement to towards cloud (SD-Networks) or SD-WAN. You ping, the ping will reach the destination but not at the same speed as your VOIP. And this leads you to: Nothing. No conclusion, because there could be multiple issues at the same time (Wrong rule, wrong traffic selector, wrong traffic classification etc.). Ping(ICMP shortcut sometimes everything and uses different routes. Traceroute and other tools are doing the same. I cannot remember how often i have to discuss the traceroute outputs of customers and explaining, that this is not an issue. But its a easy tool to use and gives you something. 

    To recap:

    NC-69286: ICMP times out when Firewall Acceleration is enabled

    NC-70896: Internet traffic stops every time XG has an IPS or ATP update

    Those are both the affected bug IDs. It seems to be related to the Firewall Acceleration and needs to be checked. 

    __________________________________________________________________________________________________________________