Since installing multiple XG Firewalls in a multi-site environment, we have been plagued with "random" outages that last between 30-90 seconds.
I have finally correlated this with Pattern updates for either ATP, AV or IPS. During the time of the definition updates all connectivity to the XG firewall is lost. This actually brings down our Wide Area network and causes VoIP phones to restart looking for the phone server.
I have an open support ticket with Sophos but I'm awaiting their response.
I have changed the updates to happen less frequently (Daily), however when there are updates it still brings down the connection (albeit less often now).
Is there a way to still have automatic updates turned on but do them on a time schedule? I find it utterly ridiculous that the system cannot do pattern updates without bringing down the entire network.
If this is "expected" behavior what have others done as workarounds? I cannot have 30-90 seconds of downtime every other day for pattern updates.
Can you tell me why you have to get intel from your users when Sophos can just test this themselves? This has been a serious problem for at least 9 months and I would expect Sophos to be doing everything…
I have not had the issue again since running: system firewall-acceleration disable
I too am concerned I'm missing out on some performance gains by disabling this, but right now it is worth it to me. I'm hopeful a real fix comes soon.
You miss the point. As this issue seems to be not impacting all customers, there only a portion of customer affecting by this. Therefore the installation base affecting of this issue seems to be smaller. Its not the question, if the virtual fast path is causing this issue and if so, on which appliances and in which situations and why. DEV is still looking into this issue and tries to A. Replicate this issue and B. find the reason for this in the first place.
While DEV is working on this solution, it is also currently under development to revamp the process of ATP/IPS Pattern process.
In UTM there was a "easy workaround" for this:
Restart policy: Select the policy for connection handling when an IPS engine restart is required, for example when the engine is updated.
Drop (default): All incoming and outgoing connections will be dropped during engine restart.Bypass: All incoming and outgoing connections will bypass IPS scanning while the engine is restarting.
The point is: customer against the security concerns moved to "bypass", which is actually a bad practice. You could easily say "Why not implement a bypass option in SFOS?". But from a security perspective, there are other approaches to begin with. I would not implement nor enable such a feature in SFOS.
As you currently see: This issue was not there before a release. Somehow the virtual fast path seems to have a issue with the reload of the engine and dropping the session in certain edge cases, which still needs to be validated.
This is crazy. It's totally unacceptable for a firewall to drop all internet traffic for 30 seconds every day whilst it updates its patterns. Especially now when people are so much more reliant on video conferencing and online meetings. The fact that this issue has been flagged for this long and not treated as a serious problem that requires a critical fix is a real concern.
Again, this is not normal. It seems not to affect everybody. Therefore it needs to be investigated in more depth. And to know the feedback, affected customers with this workaround are resolved, is a good indicator.
Is it possible that this is more noticeable on small CPU appliances? So that installing the patterns requires more time with slow CPU speeds?
I am only "missing the point" because I have been consistently told by Sophos that dropping connections during pattern updates was "expected behaviour". This was the reason given for closing my case after long discussions between senior support staff and the dev team. To quote the L2 Senior Escalations Engineer I was working with, "Having escalated the case internally, the behaviour you have been seeing is expected and there is no current workaround other than what you are currently using."
I spent many, many hours on this case and now you are telling me, 8 months later, that what the dev team said at the time just isn't true. Are you surprised that I'm pretty angry about this? Are you surprised that, once again, my confidence in Sophos products has been undermined when your own dev team don't seem to understand their own product?
We tested this issue on 3 devices, all of which consistently exhibited the problem. Two 100 series devices dropped the traffic for about two minutes, a 400 series device dropped it for about four seconds.
So it is still existing even after disable the fastpath. Will forward this to this feedback to the ID.
No...this was in response to LHerzog and was based on our previous testing. I have not yet to tested again with fastpath disabled.
Would be nice to know, if this actually addresses your issue or not. Sophos is going to address this in V19.0 anyway, as the interaction with the restart of engines will be addressed.