Can someone please explain to me what is going on with the releases. I'm on 9.355-1 and had 4 updates listed. I didn't apply these because I saw some horror stories happening to people running HA while updating these. I noticed emails recently about some new releases, when I now look I have only 2 updates listed. Is Sophos periodically pulling (removing) some of these?
From the command line: ls /var/up2date/sys
If you have 9.355-to-9.356 and 9.356-to-9.404, that's because the Up2Date process deleted 9.355-to-9.400, 9.400-to-9.401, 9.401-to-9.402 and 9.402-to-9.403. On 6/23, these were deleted and then replaced by 9.355-to-9.356 and 9.356-to-9.404.
Cheers - Bob
I had some issues last week because, as Bob explained, the update chain was updated on 6/23 and, for some reason, my master nodes had their update files updated, but my slave nodes did not.
Yesterday I had an slave node with this behavior and it was self corrected by rebooting the slave node. I think this forced a resync (obviously), and up2date files were redownloaded, this time with the correct chain of updates. To check, do this:
1) SSH into your master node
2) run ls /vas/update/sys
3) there should be two files: one for update 9.355-to-9.356 and another to update 9.356-to-9.404
4) run ha_utils ssh (ths will log you into the slave node)
5) run ls /vas/update/sys again. This time, the output will be the up2date files on your slave node
6) if the files are the same as on the master node, go ahead and up2date, it will work just fine
7) if the up2date files differ on master and slave node, try rebooting your slave node through "Management / High Availability" on WebAdmin. This will force a resync, and the up2date files should be redownloaded on the slave node
8) Just to be sure, repeat steps 1, 4 and 5. The up2files on the slave node should match the ones on the master node by now, and you can safely update your cluster.
Regards - Giovani
I have a similar problem with two couples of UTM in HA mode with reserved configuration.
The slave node (node2) upgraded to 9.403-4 and became master. Then the new slave (node1) remains in RESERVED but if I try the "UPGRADE NODE" it remains in stuck.
In the /var/up2date/sys of node1 (slave) I found only the file u2d-sys-9.356003-404005.tgz.gpg.
I also tried to reboot the slave node and submit again the "UPGRADE NODE" but always the same problem...
Any suggestions before the support call?
That's exactly what happened to me when I upgraded my first cluster. First of all, if you have a commercial license, I strongly suggest you contact support and have a ticket open. What I'll lay down next could void your warranty as per Sophos support policy. That being said, lets get to it:
2) run ha_utils ssh and SSH into you slave as loginuser and then su as root
3) run cat /etc/version and collect which version your slave UTM is on
4) run rm /var/up2date/sys/* (careful with this one, make sure you type it correctly!)
Keep this session open and connected, you will need it later
5) based on the UTM version from your slave node, access http://download.astaro.com/UTM/v9/up2date/ and get the links necessary to redownload the right up2date files. In my case, my slave UTM was on version 9.403004, so I only needed to download u2d-sys-9.403004-404005.tgz.gpg to get it to 9.404005
Since your slave UTM will not have any internet access, you'll need to download the up2date files into your master node and then scp it into your slave node
6) SSH into you master node again
7) run wget link to the up2date file for every file necessary to get your slave UTM version to 9-404005. For example, if you slave UTM is in 9.403004, wget http://download.astaro.com/UTM/v9/up2date/u2d-sys-9.403004-404005.tgz.gpg
8) after you download all necessary up2date files, still on your master node, run scp downloaded-up2date-file.gpg email@example.com:., for example: scp u2d-sys-9.403004-404005.tgz.gpg firstname.lastname@example.org:. (the "dot" at the end is necessary!) (If the slave is Node 1, replace the .2 in the foregoing with .1.)
Do that for every file you have downloaded
9) on you slave SSH session that you should have left open, run cp /home/login/u2d-sys-* /var/up2date/sys/
This will copy the up2date files to the right location so up2date will work
10) run auisys.plx --simulation
Remember, this is in your slave node. This will run up2date in simulation mode. Nothing will actually happen, but the up2date process will check if a up2date would be successful or not
11) run tail -f /var/log/up2date.log and make sure the simulation completes with no errors.
12) if simulation has no errors, run auisys.plx
Remember, this is in your slave node. This will run up2date for real, and your slave should be updated to 9.404005. It will reboot, resync, and will still be as RESERVED.
13) Login into WebAdmin and start up2date again. It should skip updating your slave node and start updating your master node. After the master node is updated to 9-404005, your slave nod should resync and exit RESERVED.
Word of advice: do this after hours. Your slave nod WILL NOT become master, so you WILL LOSE internet connectivity for a while.
Word of advice 2: Again, if you have a commercial license, contact support and let them deal with it.
Thanks Bob for the answer & Giovani for the neurosurgery required to correct.
thanks for your reply and detailed explaination.
Unfortunately in my case the master now is at 9.403-4 and the slave (reserved node) is at 9.356-3
so my tgz file should be u2d-sys-9.356003-403004.tgz.gpg but on http://download.astaro.com/UTM/v9/up2date/
I can found only u2d-sys-9.356003-404005.tgz.gpg.
However I already opened a support case...I let you know how they solve.
Regards - Max.
Max, you can upgrade your slave using u2d-sys-9.356003-404005.tgz.gpg. Your master should upgrade to 9.404-5 afterwords, as it probably already has a up2date file from 9.403-4 to 9.404-5 in /var/up2date/sys/.
That being said, the smart move here is to contact support. Let them into your system and they should fix it for you.
Max & Giovani, I just had another thought and replied a bit differently to what I now see is the same question: https://community.sophos.com/products/unified-threat-management/f/53/p/78462/300711#300711
thanks for your reply.
While I'm waiting for the support solution I can say that the your solution can't meet the reason why using the HA configuration.
Here the problem is a bug of upgrade functionality on Sophos UTM :-(
However, waiting the support solution, I analyzed other HA configurations that can lead to the same situation in order to help other users
to avoid this situation.
Here the scenario:
Master v. 9.355-1 /var/up2date/sys content:
-rw-r--r-- 1 root root 43781654 May 25 12:57 u2d-sys-9.355001-356003.tgz.gpg-rw-r--r-- 1 root root 417408543 Jun 28 13:03 u2d-sys-9.356003-404005.tgz.gpg
Slave v. 9.355-1 /var/up2date/sys content:
-rw-r--r-- 1 root root 43781654 May 25 12:57 u2d-sys-9.355001-356003.tgz.gpg-rw-r--r-- 1 root root 402123204 May 25 13:03 u2d-sys-9.356003-401011.tgz.gpg-rw-r--r-- 1 root root 417408543 Jun 28 13:03 u2d-sys-9.356003-404005.tgz.gpg-rw-r--r-- 1 root root 158211444 May 10 13:05 u2d-sys-9.401011-402007.tgz.gpg-rw-r--r-- 1 root root 6758490 May 25 13:03 u2d-sys-9.402007-403004.tgz.gpg
(Notice the files are not the same on both units, and can't be seen via webmin interface!)
2) After upgrade, the slave node become master at version 9.404-4 and the slave can't upgrade to to 9.403-4 because the /var/up2date/sys contains only u2d-sys-9.356003-404005.tgz.gpg
To avoid this situation, you have to upgrade to 9.356-3 before.
After that both units will be at 9.356-3 and then you have to force a manual check for firmware download (or wait the schedulation) so also the master can download the u2d-sys-9.356003-404005.tgz.gpg file and you can continue with the upgrade normaly....
This is my solution for other couples of HA units having this issue.
I hope this can help.