I just had a very bad experience updating XGS126 from 19.0MR1 to 19.0MR3 to 20.0GA in active-passive HA.
Node A Primary
Node B Aux
Update to 19.0MR3 seems to be fine. As Node B updated, restarted and became Primary and Node A updated and became Aux. HA Status in WebAdmin correct - all green.
Then clicking restart to update 20.0GA and primary became unavailable immedately.
So Aux became Primary showing previous primary Faulty.
On console I noticed Aux was in failsafe due to “Unable to apply NAT Rules”. But that node being in failsafe was not visible in WebAdmin before.
Is it correct that there’s no sign in WebAdmin or HA-Status when Aux Node booted in failsafe mode?
I tried to fix NAT Rules (Problem with WAN-Interface Port in Default SNAT-Rule) and rebooted failsafe node.
Suprisingly the previously faulty node took over immediately without being failsafe mode, correct on 19.0MR3 and without NAT-Problems. But: this node was no longer aware of any HA-config. Just HA disabled and standalone.
Then failsafe node came back after reboot and killed network as both nodes had Primary IPs assigned…
Called Customer and manually shutdown that node on-site.
Then successfully continued upgrading standalone node to 20.0GA.
But…that node was not license master. And.. for sure.. Subscription Transfer between both firewalls was not possible in Central Dashboard… Took Sophos Support around and hour to transfer licenses manually.
Now probably re-image second node on-site and rebuild HA.
So back to failsafe mode:
- where do i see, when appliance booted in failsafe mode (except console)?
- what to do when in failsafe mode? Especially within ha/cluster
- there is not much documentation on failsafe mode except how to get some details on the reason
This thread was automatically locked due to age.