Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

18.5 MR2 to MR3 Upgrade broke HA cluster and node not updating

3 XGS136 18.5 MR2 HA Clusters updated to MR3

2 OK

1 failed

The failed cluster sent us mails that the upgrade failed but the primary node did upgrade to MR3.

Then we received mails that HA has been disabled

You are receiving this auto-generated message from Sophos Notification System to inform that HA is Disabled.

Time:15:36:55

Appliance Key

Model Number

Firmware Version

HA Status

X13108Kxxxx41

XGS136

SFOS 18.5.2 MR-2-Build380

Disable

X13108MMxxxxDA6

XGS136

SFOS 18.5.2 MR-2-Build380

Disable

Connected to LAN1 on the 2nd node it is still showing MR2. System log is empty.

It showed MR3 as available and ready for install. Clicked install. "Applying firmware" in a green box appeared and the WebAdmin became unresponsive.

It did not upgrade and is still at MR2.

now that the machine is no longer in cluster, it has no WAN Gateway but is stating that the firmware is on 2nd slot..

After reboot showing this popup without text on the buttons. I've seen this once before somewhere.

Looks messed up.

Any idea how to resolve? reimage with MR3??



This thread was automatically locked due to age.
Parents
  • don't know why it failed but on HA's I always click SYNC in the GUI first.

    The HA status has a habit as showing as healthy, even when it's not. When you hit SYNC, I usually reboots the active node, switches to the AUX and then SYNCs and comes back to healthy state after about 10 minutes.

    I do this twice, so both nodes have been rebooted and after 20 minutes I know the SYNC state is healthy for real. Only then would I do the firmware upgrade.

    Prior to that, I always take a backup and check "df -h" to make sure both nodes have free space.

    Not saying that will help you, just some ideas.

    ------------------------------------------------

    worlds number one free ICMP monitoring platform: https://pinescore.com

  • today I'm fixing this issue and after I re-imaged the node that left HA and now trying to re-establish the HA setup on both  machines, this is what I found on the machine that updated to MR3 when the cluster was still working,

    and Port 11 is not the HA port. How can this happen??

    It explains why both machines lost communication to each other after the upgrade but this must not happen.

    And on 2 other XGS 136 this did not happen before.

Reply
  • today I'm fixing this issue and after I re-imaged the node that left HA and now trying to re-establish the HA setup on both  machines, this is what I found on the machine that updated to MR3 when the cluster was still working,

    and Port 11 is not the HA port. How can this happen??

    It explains why both machines lost communication to each other after the upgrade but this must not happen.

    And on 2 other XGS 136 this did not happen before.

Children
No Data