HA config - After trying to upgrade to the latest version I can no longer reach the web admin page

Sophos UTP PC Config:

The PCs (HP SFF 6200 pro i3 (2120) 3.3Ghz) that I have been using for Sophos UTM for years has a total of 6 Ethernet ports.

eth0: (Dell/Intel U3867 single port PCIe) is the HA Cluster connection between the two Sophos HA systems.

eth1, 2, 3, & 4: (Dell/Intel OYT674 Intel Pro/1000 VT QUAD port GB PCIe) is the internal/in-house connection and is used for the web admin page. They have link aggregation enabled.

eth5: (Onboard Gb NIC I believe this is also an Intel chipset) is the external WAN connection

Issue:

Last night I took the option to apply the latest update (I believe I was at 9.719 before and was upgrading to 9.720). 

While upgrading I noticed my internet was no longer working which is abnormal with the HA setup.

After waiting about an hour I shut down both systems and tried to bring up just the master.  

After the system booted I noticed there were not any lights on the Intel Ethernet quad port card.

I then brought up the standby (slave) system.  It showed 2 of the 4 lights on the quad-port Intel Ethernet card lit.

I tried shutting down both again and then bringing up just the slave to see if it would act as the master and the ports would come on.  That did not work.

I can log in from the console with root on both systems.  I noticed from the prompt they both have the same name (which I believe is normal) and one system's prompt starts with <M> and the other starts with <S>.

Running IP -a on the one whose prompt shows <M> shows a total of 8 ports 1: (loopback), 2: eth0, 3: eth1, 4: eth2, 5: eth3, 6: eth4, 7: eth5, and 8: (ifb0).  eth1,2,3 & 4n (quad port ethernet card) all show state DOWN.  None of the lights on the back of the HBA are lit.

IP -a on the one whose prompt shows <S> lists 5 ports. 1: (loopback), 2: eth0, 3:eth1, 4:eth2, and 5: (ifb0). eth1 and eth2 which are 2 of 4 ports on the quad-port card show status DOWN.  There are 2 lights on the 2 middle ports of this quad-port HBA.

I do not know why the system with <M> does not have any of the 4 ports lit.

I do not know why the system with <S> has 2 of the 4 ports lit.  I thought if it was in <S> mode, they would be all off.  I also do not understand why 2 of the 4 ports on this system are not listed in IP a.

Does anyone have any ideas on how fix this?  I'm trying to avoid having to do a total reload of Sophos.



Edited after learning IP a did show the quad port Ethernet card (so the drivers must be loading)
[edited by: Damon Dawson at 4:45 PM (GMT -7) on 16 Oct 2024]
  • I noticed while booting and pressing F2 at the Sophos splash screen there was a step that read "::Restore backup" and its status then read "Skipping".  I'm hoping there is an option to press some sort of key combo to cause Sophos to restore the last backup.   Does anyone know if that is possible or if there is some other way to restore the previous firmware/config via the console (since the web admin page isn't even reachable)?

  • Since no one seems to be responding to this with any suggestions and I cannot find anything that would allow me to restore the backup that it should have taken automatically from the console.  I think I don't have any other choice but to reload and reconfigure manually. :(

  • Are you able to ping the webadmin interface IP? (Of course this may not be possible if you did not allow to ping the UTM in your config)

    I´m sure you have to reassign the hardware nics to the UTM Interfaces. What do a ifconfig -a show?

    Maybe you have to reassign an IP first with ifconfig and after that you can reassign the hardware nics in your config here:

    CC or confd-client.plx [ENTER]

    OBJS [ENTER]

    interface [ENTER]

    ethernet [ENTER]

    regards

  •   THANK YOU for trying to help.  Running "ip a" on the nodes showed the Master's ports, used with link aggregation (eth1 thru eth4), as all DOWN and their lights were off.  The Slave node only showed 2 of the 4 ports.  So it was REALLY messed up and looked as if both nodes were somehow acting as Slaves and were in standby status.  Because I couldn't wait, I ended up reloading everything from scratch.

    I kind of blame myself for not doing a backup before clicking the update button (I will not ever do that again). I did have an old config backup from about year ago that I used which had most of the settings I needed. It was good to refresh my documentation (silver lining moment). 

    One thing to point out is Sophos's documentation is very lacking for the order of operation with configuring their High Availability (HA) feature. They do not explain which system to start with, etc.  I wish they would update that. 

    To help others that might read this blog, at a high level, here's what I did to recover and configure HA:

    1. Reinstalled Sophos UTM from the .iso download on the Master (assigning the permanent IP that will be used for production).

    2. I accessed the WebAdmin page, set the node name and initial admin password, and chose the option to restore the settings from my backup.

    3. After that it returned me to the login page but stopped responding.  After waiting for about 15 minutes, I then rebooted the system and that corrected the issue. 

    4. I then repeated the process on the Slave node but this time I used a temporary IP address and did not take the option to restore.  After walking through the wizard, and taking all the defaults, I logged in to the WebAdmin page.

    5. From the Master node I enabled HA as Hot Standby (passive-active) and set up the node ID as 1 gave it the node name and filled in the encryption key.

    6. From the Slave node I enabled HA and up the ID as 2, gave it the node name, and filled in the encryption key.  The Slave node was then completed.  I opened the WebAdmin page back on the Master node and it showed the Slave status as Syncing.  Once complete it then showed the Master status as Active and the Slave status as Ready.

    I know the system automatically backs up the config before applying firmware. I wish they would add a hotkey or a console command that would list out all the backups on disk and let you pick to restore.  That way there would be a way out if the WebAdmin GUI isn't available.

  •  Hi Steve,

    I have the config backup email me every day.  Its under Management->Backup/Restore->Automatic Backups. You can set it to a lower interval as needed.

    Regards
    Damien