Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

What to do in failsafe mode

I just had a very bad experience updating XGS126 from 19.0MR1 to 19.0MR3 to 20.0GA in active-passive HA.

Node A Primary
Node B Aux

Update to 19.0MR3 seems to be fine. As Node B updated, restarted and became Primary and Node A updated and became Aux. HA Status in WebAdmin correct - all green.

Then clicking restart to update 20.0GA and primary became unavailable immedately.

So Aux became Primary showing previous primary Faulty.

On console I noticed Aux was in failsafe due to “Unable to apply NAT Rules”. But that node being in failsafe was not visible in WebAdmin before.

Is it correct that there’s no sign in WebAdmin or HA-Status when  Aux Node booted in failsafe mode? 

I tried to fix NAT Rules (Problem with WAN-Interface Port in Default SNAT-Rule) and rebooted failsafe node.
Suprisingly the previously faulty node took over immediately without being failsafe mode, correct on 19.0MR3 and without NAT-Problems. But: this node was no longer aware of any HA-config. Just HA disabled and standalone.

Then failsafe node came back after reboot and killed network as both nodes had Primary IPs assigned…

Called Customer and manually shutdown that node on-site.
Then successfully continued upgrading standalone node to 20.0GA.

But…that node was not license master. And.. for sure.. Subscription Transfer between both firewalls was not possible in Central Dashboard… Took Sophos Support around and hour to transfer licenses manually.

Now probably re-image second node on-site and rebuild HA.

So back to failsafe mode:
  - where do i see, when appliance booted in failsafe mode (except console)?
  - what to do when in failsafe mode? Especially within ha/cluster
  - there is not much documentation on failsafe mode except how to get some details on the reason



This thread was automatically locked due to age.
  • No one around with any information on Failsafe Mode?
    The only information i could get is https://support.sophos.com/support/s/article/KB-000036376?language=en_US
    B
    ut that does not really help when being in such situation - most likely evening/weekend within limited maintenance timeframe.

    I already went through Sophos-Architect and -Technician Course Material without details on failsafe mode as well.

    Having more documentation on this would be more than appreciated.

    As well, how to notice an appliance being in failsafe (example above with Aux-Node in HA being failsafe without any warning on HA-Status, Webadmin,… at all) - making cluster at least somehow useless.

  • Hi  ,

    Thanks for reaching out to Sophos Community and we regret to hear you faced an issue. 

    Could you open a support case and please do share the CaseID with us once you have it. 

    Many thanks for your time and patience and thank you for choosing Sophos. 

    Regards,

    Raphael Alganes
    Community Support Engineer | Sophos Technical Support
    Sophos Support Videos Product Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

  • Hi  ,

    for the current problem described above there’s no need to investigate this further. One appliance is upgraded successfully and I will re-image and rebuild HA on-site.

    but for future situations, I might face failsafe-mode I’d like to get more information/documentation about failsafe mode in general:

    - How to monitor if an appliance is in failsafe mode
    - What possible reasons are there in failsafe mode and how to fix and leave
    - …

    Do I need to open a case for such general information?
    Or wouldn’t it be useful to publish at least here, so everyone else can find these details using search later. 

  • Overall failsafe is nowadays rarely seen. 

    You see failsafe in cases like Hardware failure or edge cases of configs. 

    As Failsafe is something, Sophos wants to fix entirely, it is important to reproduce those cases and fix them. 

    Failsafe was an issue, which came up in the past more often than nowadays. Therefore you see not many customers with the failsafe issues, as Sophos adressed those situations to not come up anymore. 

    So the general situation is, if you come across a failsafe situation, Sophos would like to have the backup and how you reproduced this state to adress this for the future. 

    __________________________________________________________________________________________________________________