This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Dropped Connections during Pattern Updates

Since installing multiple XG Firewalls in a multi-site environment, we have been plagued with "random" outages that last between 30-90 seconds.

I have finally correlated this with Pattern updates for either ATP, AV or IPS. During the time of the definition updates all connectivity to the XG firewall is lost. This actually brings down our Wide Area network and causes VoIP phones to restart looking for the phone server.

I have an open support ticket with Sophos but I'm awaiting their response.

I have changed the updates to happen less frequently (Daily), however when there are updates it still brings down the connection (albeit less often now).

Is there a way to still have automatic updates turned on but do them on a time schedule? I find it utterly ridiculous that the system cannot do pattern updates without bringing down the entire network.

If this is "expected" behavior what have others done as workarounds? I cannot have 30-90 seconds of downtime every other day for pattern updates.

This thread was automatically locked due to age.

Top Replies

JasP over 3 years ago in reply to LuCar Toni +4

Can you tell me why you have to get intel from your users when Sophos can just test this themselves? This has been a serious problem for at least 9 months and I would expect Sophos to be doing everything…

Parents

0 Bill Roland over 3 years ago

https://community.sophos.com/sophos-xg-firewall/f/discussions/123652/internet-traffic-stops-every-time-xg-has-an-ips-or-atp-update I think this is probably related.
Cancel
Vote Up 0 Vote Down

Cancel
0 Ryan McMillan over 3 years ago in reply to Bill Roland

Thanks Bill. I agree and have seen this article as well.

But there is currently no fix and no workaround other than to turn off automatic pattern updates? How can we have a firewall device that drops all connections during pattern updates? How can I recommend to enterprise? How do I get more visibility to this? I've also seen the Sophos Idea to give more control over scheduling these updates which I have upvoted, but frankly, I don't want to lose connection, EVER.

I'm awaiting Sophos support to get back to me on my questions above as well, but I just can't fathom how this is acceptable on any level.

I feel like now I am forced to choose between consistent connectivity by turning off automatic pattern updates and security.
Cancel
Vote Up +3 Vote Down

Cancel
0 LuCar Toni over 3 years ago in reply to JasP

Would be nice to know, if this actually addresses your issue or not. Sophos is going to address this in V19.0 anyway, as the interaction with the restart of engines will be addressed.

__________________________________________________________________________________________________________________
Cancel
Vote Up 0 Vote Down

Cancel
0 Rockfish over 3 years ago in reply to LuCar Toni

Yes, I am seeing this behaviour on firewalls that have fastpath disabled (which seems to be the default for XG's that are in a cluster).

It would make more sense if this was the other way around, surely? if fastpath routes 'trusted' traffic directly without IPS checking it, it shouldn't be affected by the IPS service restarting? Where as if fastpath is disabled, and traffic cannot be checked as IPS was restarting, then the traffic would be dropped?
Cancel
Vote Up 0 Vote Down

Cancel
0 LuCar Toni over 3 years ago in reply to Rockfish

Virtual Fastpath is a component, which uses Snort as well. Therefore if Snort uses a update, it could drop the session as well, but certainly not in each and every case.

VFP is per default enabled on all appliances (And HA). But was disabled pre V18.0 MR4. It will not get enabled after an upgrade, instead you can change your config and enable it.

__________________________________________________________________________________________________________________
Cancel
Vote Up 0 Vote Down

Cancel
0 LHerzog over 3 years ago in reply to LuCar Toni

some interesting facts are coming up here. Any reason for the default disabled VFP setting in MR4? Is this only for fresh installations on MR4? What is this with migrations over MR4 to MR5. We went from 17.5 MR12 over 18 MR1,->4,->5 where we re-imaged our appliances when going to MR4, then imported the config.

VFP was enabled when checking it recently but has now been disabled because asked by support for some kind of issue without fxing the issue by the disabled setting.

Rockfish can you provide some steps how you measured the time of connection loss?

I'd like to review this with our XG430s HA.

I know we lost traffic for some seconds when disabling VFP.
Cancel
Vote Up 0 Vote Down

Cancel
0 LuCar Toni over 3 years ago in reply to LHerzog

Sophos is not enabling most settings after a firmware upgrade to avoid issues within the network after a firmware update. V18.0 MR4 enabled VFP option on HAs. Customers coming from a older version, had this disabled and can enable it, if they want. This option will be likely be enabled with a future release.

A new installation without backup/restore will have VFP enabled per default.

__________________________________________________________________________________________________________________
Cancel
Vote Up 0 Vote Down

Cancel
0 JasP over 3 years ago in reply to LuCar Toni

Is there a command to show if it is enabled (rather than enable/disable it)?
Cancel
Vote Up 0 Vote Down

Cancel
0 LuCar Toni over 3 years ago in reply to JasP

console> system firewall-acceleration show
Firewall Acceleration is Enabled in Configuration.

__________________________________________________________________________________________________________________
Cancel
Vote Up 0 Vote Down

Cancel
0 Rockfish over 3 years ago in reply to LHerzog

Just from seeing the issue a few times, we'd typically notice that a MS Teams call would stop responding, then i'd try a web browser and see that it was a generic 'page cannot be displayed'. Give it around 20 seconds and then it works again as expected. But normally the delay is long enough that you'll get dropped from your Teams call and need to dial back in again. Super frustrating.

I may try the workaround by JasP and reboot our firewall in the early hours, hoping that the pattern updates will take place 24 hours again after that (e.g. out of hours).
Cancel
Vote Up 0 Vote Down

Cancel
0 JasP over 3 years ago in reply to LHerzog

We use a program called PingPlotter. We run pings to several external addresses (to avoid false positives) and the XG IP as well. It maintains logs of all the connections and we can check those to see at the time of an update, the ping to the XG is fine but all the other pings are blocked.

You can use PingPlotter free but if you want to run it as a service, you can either run a 14 day trial or buy it.
Cancel
Vote Up 0 Vote Down

Cancel
0 LuCar Toni over 3 years ago in reply to JasP

PS: Keep in mind, Ping is not a TCP/UDP connection. There is another Bug ID related to Pings in virtual fastpath, as ICMP seems to behavior differently. As there are no real indication of session in ICMP, it cannot remain the session. Therefore if the ping packet is lost, its lost. TCP/UDP can work with retransmission and there pickup the same session. So a Ping lost does not have to result into a lost session within the network.

__________________________________________________________________________________________________________________
Cancel
Vote Up +1 Vote Down

Cancel

Reply

0 LuCar Toni over 3 years ago in reply to JasP

PS: Keep in mind, Ping is not a TCP/UDP connection. There is another Bug ID related to Pings in virtual fastpath, as ICMP seems to behavior differently. As there are no real indication of session in ICMP, it cannot remain the session. Therefore if the ping packet is lost, its lost. TCP/UDP can work with retransmission and there pickup the same session. So a Ping lost does not have to result into a lost session within the network.

__________________________________________________________________________________________________________________
Cancel
Vote Up +1 Vote Down

Cancel

Children

0 LHerzog over 3 years ago in reply to LuCar Toni

thanks for your replies. seems hard enough to even create a valid test scenario...
Cancel
Vote Up +1 Vote Down

Cancel
0 JasP over 3 years ago in reply to LuCar Toni

I appreciate that ping isn't the same as TCP/UDP but it was a useful tool to get some insight into why users were complaining of lost internet connections. Should have known that Sophos would have a separate bug for ICMP in vitual fastpath!

The one thing I can confirm, on the two sites I have just tested it, pings don't stop when there is an update and Firewall Acceleration is disabled.
Cancel
Vote Up +1 Vote Down

Cancel
0 LuCar Toni over 3 years ago in reply to JasP

From a network perspective, ping is always a bad tool to troubleshoot further more than "is a connection even possible?". Because looking at Ping(ICMP) is its like looking at a street with jammed traffic. Using ICMP could mean, you use a motorcycle going through the traffic and still reaching the destination, but your "real traffic (cars) cannot do this. It simply does not reflect in some cases the real world. I saw a lot of administrators struggling with this especially in the movement to towards cloud (SD-Networks) or SD-WAN. You ping, the ping will reach the destination but not at the same speed as your VOIP. And this leads you to: Nothing. No conclusion, because there could be multiple issues at the same time (Wrong rule, wrong traffic selector, wrong traffic classification etc.). Ping(ICMP shortcut sometimes everything and uses different routes. Traceroute and other tools are doing the same. I cannot remember how often i have to discuss the traceroute outputs of customers and explaining, that this is not an issue. But its a easy tool to use and gives you something.

To recap:

NC-69286: ICMP times out when Firewall Acceleration is enabled

NC-70896: Internet traffic stops every time XG has an IPS or ATP update

Those are both the affected bug IDs. It seems to be related to the Firewall Acceleration and needs to be checked.

__________________________________________________________________________________________________________________
Cancel
Vote Up 0 Vote Down

Cancel
0 ken9000 over 3 years ago in reply to LuCar Toni

Any updates to these issues? We're still getting 100+ users unable to access the internet for several minutes a day due to 100% CPU usage.
Cancel
Vote Up 0 Vote Down

Cancel
0 LuCar Toni over 3 years ago in reply to ken9000

Did you disable the Firewall acceleration?

__________________________________________________________________________________________________________________
Cancel
Vote Up 0 Vote Down

Cancel
0 ken9000 over 3 years ago in reply to LuCar Toni

Does it require a reboot after disabling? What are the potential performance impacts of disabling it? Why did firewall acceleration suddenly break things?
Cancel
Vote Up 0 Vote Down

Cancel
0 LuCar Toni over 3 years ago in reply to ken9000

I answered most of those questions in this thread above.

__________________________________________________________________________________________________________________
Cancel
Vote Up 0 Vote Down

Cancel
0 ken9000 over 3 years ago in reply to LuCar Toni

I don't see an answer to the question of whether or not a reboot is officially required after running the disable command.
Cancel
Vote Up 0 Vote Down

Cancel
0 LuCar Toni over 3 years ago in reply to ken9000

Reboot is not required, but connections will be dropped by entering this command.

__________________________________________________________________________________________________________________
Cancel
Vote Up 0 Vote Down

Cancel
0 ken9000 over 3 years ago in reply to LuCar Toni

Ok, thank you.
Cancel
Vote Up 0 Vote Down

Cancel