Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

What to do in failsafe mode

I just had a very bad experience updating XGS126 from 19.0MR1 to 19.0MR3 to 20.0GA in active-passive HA.

Node A Primary
Node B Aux

Update to 19.0MR3 seems to be fine. As Node B updated, restarted and became Primary and Node A updated and became Aux. HA Status in WebAdmin correct - all green.

Then clicking restart to update 20.0GA and primary became unavailable immedately.

So Aux became Primary showing previous primary Faulty.

On console I noticed Aux was in failsafe due to “Unable to apply NAT Rules”. But that node being in failsafe was not visible in WebAdmin before.

Is it correct that there’s no sign in WebAdmin or HA-Status when  Aux Node booted in failsafe mode? 

I tried to fix NAT Rules (Problem with WAN-Interface Port in Default SNAT-Rule) and rebooted failsafe node.
Suprisingly the previously faulty node took over immediately without being failsafe mode, correct on 19.0MR3 and without NAT-Problems. But: this node was no longer aware of any HA-config. Just HA disabled and standalone.

Then failsafe node came back after reboot and killed network as both nodes had Primary IPs assigned…

Called Customer and manually shutdown that node on-site.
Then successfully continued upgrading standalone node to 20.0GA.

But…that node was not license master. And.. for sure.. Subscription Transfer between both firewalls was not possible in Central Dashboard… Took Sophos Support around and hour to transfer licenses manually.

Now probably re-image second node on-site and rebuild HA.

So back to failsafe mode:
  - where do i see, when appliance booted in failsafe mode (except console)?
  - what to do when in failsafe mode? Especially within ha/cluster
  - there is not much documentation on failsafe mode except how to get some details on the reason



This thread was automatically locked due to age.
Parents Reply Children
  • Hi  ,

    for the current problem described above there’s no need to investigate this further. One appliance is upgraded successfully and I will re-image and rebuild HA on-site.

    but for future situations, I might face failsafe-mode I’d like to get more information/documentation about failsafe mode in general:

    - How to monitor if an appliance is in failsafe mode
    - What possible reasons are there in failsafe mode and how to fix and leave
    - …

    Do I need to open a case for such general information?
    Or wouldn’t it be useful to publish at least here, so everyone else can find these details using search later. 

  • Overall failsafe is nowadays rarely seen. 

    You see failsafe in cases like Hardware failure or edge cases of configs. 

    As Failsafe is something, Sophos wants to fix entirely, it is important to reproduce those cases and fix them. 

    Failsafe was an issue, which came up in the past more often than nowadays. Therefore you see not many customers with the failsafe issues, as Sophos adressed those situations to not come up anymore. 

    So the general situation is, if you come across a failsafe situation, Sophos would like to have the backup and how you reproduced this state to adress this for the future. 

    __________________________________________________________________________________________________________________