UPDATE: scroll down for fix.
big thanks to: dirkkotte and solae
tl;dr: i need access to the following UTM u2date package u2d-sys-9.711005-712012.tgz.gpg which was removed by sophos from the download page.
our customer bricked his SG450 A/S cluster today by trying to upgrade to from 9.711005 to 9.712013
The full upgrade path was shown as:
9.711005 to 9.712012
9.712012 to 9.712013
Unit2 managed to upgrade to 9.712012
Unit1 stucks in up2date 9.711005 to 9.712012, since the download was removed.
both units can't go to 9.712013! it's not possible to upgrade 5 to13 while one box stucks on 12
unit2 is blocked to go to 9.712013 because unit1 still trying to upgrade to 9.712012
unit1 can't go directly to 9.712013 because unit2 already has 9.712012 installed.
after some research it looks like sophos removed 9.712012 and replaced it with 9.712013
Both relelease notes:
https://community.sophos.com/utm-firewall/b/blog/posts/utm-update-9-712-13-released
https://community.sophos.com/utm-firewall/b/blog/posts/utm-up2date-9-712-released-1300703171
refer to the new package: u2d-sys-9.711005-712013.tgz.gpg
but to fix the cluster, i would first need to install u2d-sys-9.711005-712012.tgz.gpg on both appliances.
2022:09:26-19:02:10 ictrz-fw-01-2 auisys[30172]: Showdesc ok. 2022:09:26-19:02:10 ictrz-fw-01-2 auisys[30172]: [INFO-301] New Firmware Up2Date is ready for installation 2022:09:26-19:02:16 ictrz-fw-01-2 audld[29943]: id="3701" severity="info" sys="system" sub="up2date" name="Authentication successful" 2022:09:26-19:02:16 ictrz-fw-01-2 audld[29943]: Using static download server list in HA mode 2022:09:26-19:02:16 ictrz-fw-01-2 audld[29943]: id="3707" severity="info" sys="system" sub="up2date" name="Successfully synchronized fileset" status="success" action="download" package="sys" 2022:09:26-19:02:17 ictrz-fw-01-2 auisys[30295]: running on HA master system or cluster node 2022:09:26-19:02:17 ictrz-fw-01-2 auisys[30295]: >========================================================================= 2022:09:26-19:02:17 ictrz-fw-01-2 auisys[30295]: Another instance of auisys is already running. 2022:09:26-19:02:17 ictrz-fw-01-2 auisys[30295]: Aappending job to queue! Exiting 2022:09:26-19:02:22 ictrz-fw-01-2 auisys[30341]: running on HA master system or cluster node 2022:09:26-19:02:22 ictrz-fw-01-2 auisys[30341]: >========================================================================= 2022:09:26-19:02:22 ictrz-fw-01-2 auisys[30341]: Another instance of auisys is already running. 2022:09:26-19:02:22 ictrz-fw-01-2 auisys[30341]: Aappending job to queue! Exiting 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <man9> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <aws> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <clvbrowser> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <appctrl43> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <ohelp9> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <aptp> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <cadata> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <geoip> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <man9> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <aws> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <clvbrowser> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <appctrl43> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <ohelp9> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <aptp> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <cadata> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: No suitable packages of type <geoip> found, skipping 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: Install u2d packages <sys> 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: Starting installing up2date packages for type 'sys' 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: unpacking up2date package: /var/up2date/sys/u2d-sys-9.712012-712013.tgz.gpg 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: unpacking up2date package version: 9.712013 2022:09:26-19:02:31 ictrz-fw-01-2 auisys[30172]: Verifying up2date package signature 2022:09:26-19:02:32 ictrz-fw-01-2 auisys[30172]: Unpacking installation instructions 2022:09:26-19:02:32 ictrz-fw-01-2 auisys[30172]: parsing installation instructions 2022:09:26-19:02:32 ictrz-fw-01-2 auisys[30172]: Showdesc ok. 2022:09:26-19:02:32 ictrz-fw-01-2 auisys[30172]: [INFO-301] New Firmware Up2Date is ready for installation 2022:09:26-19:02:53 ictrz-fw-01-2 auisys[30172]: Doing HA sync 2022:09:26-19:02:53 ictrz-fw-01-2 auisys[30172]: calling: </usr/local/bin/up2date_sync.sh> 2022:09:26-19:02:53 ictrz-fw-01-2 auisys[30172]: id="3720" severity="info" sys="system" sub="up2date" name="Successfully triggered up2date sync" status="success" action="sync" 2022:09:26-19:02:53 ictrz-fw-01-2 auisys[30172]: Up2Date Package Installer finished, exiting 2022:09:26-19:02:53 ictrz-fw-01-2 auisys[30172]: id="3716" severity="info" sys="system" sub="up2date" name="Up2Date Package Installer finished, exiting" 2022:09:26-19:04:00 ictrz-fw-01-1 audld[11557]: running on HA slave system or cluster node 2022:09:26-19:04:00 ictrz-fw-01-1 audld[11557]: patch up2date possible 2022:09:26-19:04:00 ictrz-fw-01-1 audld[11557]: Starting Secured Up2Date Package Downloader 2022:09:26-19:04:00 ictrz-fw-01-1 audld[11557]: Using static update server list in HA mode 2022:09:26-19:04:01 ictrz-fw-01-1 audld[11557]: Secured Up2date Authentication 2022:09:26-19:04:02 ictrz-fw-01-1 audld[11557]: id="3701" severity="info" sys="system" sub="up2date" name="Authentication successful" 2022:09:26-19:04:02 ictrz-fw-01-1 audld[11557]: Using static download server list in HA mode 2022:09:26-19:04:02 ictrz-fw-01-1 auisys[11640]: running on HA slave system or cluster node 2022:09:26-19:04:02 ictrz-fw-01-1 auisys[11640]: running on slave/cluster node, skipping license check 2022:09:26-19:04:02 ictrz-fw-01-1 auisys[11640]: waiting for db_verify to return (30 seconds max) 2022:09:26-19:04:03 ictrz-fw-01-1 auisys[11640]: removing '/var/up2date/sys-install' 2022:09:26-19:04:03 ictrz-fw-01-1 auisys[11640]: Starting Up2Date Package Installer 2022:09:26-19:04:03 ictrz-fw-01-1 auisys[11640]: version of package '/var/up2date/sys/u2d-sys-9.711005-712013.tgz.gpg' doesn't fit, skipping 2022:09:26-19:04:03 ictrz-fw-01-1 auisys[11640]: No suitable packages of type <sys> found, skipping 2022:09:26-19:04:04 ictrz-fw-01-1 auisys[11640]: Up2Date Package Installer finished, exiting
STEPS to fix broken slave:
Requirements:
- SSH access to master with loginuser
- root password.
0) check if you have a preferred master setting in the GUI and make sure it's not the slave node, that way you avoid a failback after the upgrade is finished
1) Login to current Master node via SSH with user "loginuser" and download the u2date file
ictrz-fw-01:~ # cd /home/login/ ictrz-fw-01:/home/login # wget https://www.show-run.ch/u2d-sys-9.711005-712012.tgz.gpg --2022-09-28 15:45:47-- https://www.show-run.ch/u2d-sys-9.711005-712012.tgz.gpg Resolving www.show-run.ch... 149.126.4.86, 2a01:ab20:0:4::86 Connecting to www.show-run.ch|149.126.4.86|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 268409635 (256M) [application/octet-stream] Saving to: `u2d-sys-9.711005-712012.tgz.gpg.1' 100%[===============================================================================================================================================================================================>] 268,409,635 12.4M/s in 23s 2022-09-28 15:46:10 (11.3 MB/s) - `u2d-sys-9.711005-712012.tgz.gpg.1' saved [268409635/268409635]
NOTE from Sophos: We don't endorse the use of 3rd party links, the official download link is
https://download.astaro.com/UTM/v9/up2date/u2d-sys-9.711005-712013.tgz.gpg
The command to download from the CLI of the UTM would be:
# wget https://download.astaro.com/UTM/v9/up2date/u2d-sys-9.711005-712013.tgz.gpg
2) SSH to the slave box via the HA Link. (it's either node1 or node2, it depends which box is currently slave, i my case it's node1)
ictrz-fw-01:/home/login # ssh loginuser@node1 loginuser@node1's password: Last login: Wed Sep 28 13:22:41 2022 from node2 Sophos UTM (C) Copyright 2000-2022 Sophos Limited and others. All rights reserved. Sophos is a registered trademark of Sophos Limited and Sophos Group. All other product and company names mentioned are trademarks or registered trademarks of their respective owners. For more copyright information look at /doc/astaro-license.txt or http://www.astaro.com/doc/astaro-license.txt NOTE: If not explicitly approved by Sophos support, any modifications done by root will void your support. <M> loginuser@ictrz-fw-01:/home/login >
3) now copy the u2d file from the master node over to the slave node
<S> loginuser@fw-01:/home/login > scp loginuser@node2:/home/login/u2d-sys-9.711005-712012.tgz.gpg /home/login/u2d-sys-9.711005-712012.tgz.gpg loginuser@node2's password: u2d-sys-9.711005-712012.tgz.gpg <M> loginuser@ictrz-fw-01:/home/login > ls -lha total 512M drwxr-xr-x 3 loginuser users 4.0K Sep 28 13:08 . drwxr-xr-x 3 root root 4.0K Jul 27 2010 .. -rw------- 1 loginuser users 1.8K Sep 28 14:02 .bash_history drwx------ 2 loginuser users 4.0K Sep 28 13:01 .ssh -rw-r--r-- 1 loginuser users 256M Sep 28 13:02 u2d-sys-9.711005-712012.tgz.gpg
4) sudo to root and copy the file to the up2date folder
sudo su cp /home/login/u2d-sys-9.711005-712012.tgz.gpg /var/up2date/sys cd /var/up2date/sys fw-01:/var/up2date/sys # ls -lha total 512M drwxr-xr-x 2 root root 4.0K Sep 28 13:04 . drwxr-xr-x 13 root root 4.0K Sep 26 19:04 .. -rw-r--r-- 1 root root 256M Sep 28 13:04 u2d-sys-9.711005-712012.tgz.gpg -rw-r--r-- 1 root root 256M Sep 26 07:18 u2d-sys-9.711005-712013.tgz.gpg
5) delete all other files in this folder than u2d-sys-9.711005-712012.tgz.gpg !
fw-01:/var/up2date/sys # rm u2d-sys-9.711005-712013.tgz.gpg
6) Optionally run an u2date simulation first:
fw-01:/var/up2date/sys # auisys.plx -simulation --verbose fw-01:/var/up2date/sys # auisys.plx -simulation --verbose 'simulation' mode implicits sets noqueue! running on HA slave system or cluster node running on slave/cluster node, skipping license check removing '/var/up2date/appctrl43-install' removed directory: `/var/up2date/appctrl43-install' removing '/var/up2date/aptp-install' removed directory: `/var/up2date/aptp-install' removing '/var/up2date/aws-install' removed directory: `/var/up2date/aws-install' removing '/var/up2date/cadata-install' removed directory: `/var/up2date/cadata-install' removing '/var/up2date/clvbrowser-install' removed directory: `/var/up2date/clvbrowser-install' removing '/var/up2date/geoip-install' removed directory: `/var/up2date/geoip-install' removing '/var/up2date/man9-install' removed directory: `/var/up2date/man9-install' removing '/var/up2date/ohelp9-install' removed directory: `/var/up2date/ohelp9-install' removing '/var/up2date/sys-install' removed `/var/up2date/sys-install/u2d-sys-9.712012/install-sys-9.712012.xml' removed directory: `/var/up2date/sys-install/u2d-sys-9.712012' removed directory: `/var/up2date/sys-install' <<<<---- Simulation enabled ---->>>> (simulation) Starting Up2Date Package Installer (simulation) No suitable packages of type <man9> found, skipping (simulation) No suitable packages of type <aws> found, skipping (simulation) No suitable packages of type <clvbrowser> found, skipping (simulation) No suitable packages of type <appctrl43> found, skipping (simulation) No suitable packages of type <ohelp9> found, skipping (simulation) No suitable packages of type <aptp> found, skipping (simulation) No suitable packages of type <cadata> found, skipping (simulation) No suitable packages of type <geoip> found, skipping (simulation) Install u2d packages <sys> (simulation) Starting installing up2date packages for type 'sys' (simulation) Installing up2date package: /var/up2date/sys/u2d-sys-9.711005-712012.tgz.gpg (simulation) Verifying up2date package signature (simulation) Unpacking installation instructions (simulation) parsing installation instructions (simulation) Unpacking up2date package container (simulation) Running pre-installation checks (simulation) Not installing optional aws-cfn-bootstrap ....and so on.... Would do 7, 0 [ENV 300] sh -c exec /var/up2date/sys-install/u2d-sys-9.712012/update9.712012post_start Would do 9, 0 [NOENV no] rm /var/up2date/sys/u2d-sys-9.711005-712012.tgz.gpg Would do 9, 1 [NOENV no] sync Would touch '/tmp/.u2d-sys-9.711-9.712-5.12.1.tgz' (simulation) New system version: 9.711005 (simulation) Up2Date Package Installer finished, exiting (simulation) Simulation enabled. Would do a reboot now
only continuous if simulation had no errors
7) run the update
fw-01:/var/up2date/sys # auisys.plx --rpmargs --force --verbose 'verbose' mode implicits set noqueue option! running on HA slave system or cluster node running on slave/cluster node, skipping license check waiting for db_verify to return (30 seconds max) removing '/var/up2date/appctrl43-install' removed directory: `/var/up2date/appctrl43-install' removing '/var/up2date/aptp-install' removed directory: `/var/up2date/aptp-install' removing '/var/up2date/aws-install' removed directory: `/var/up2date/aws-install' removing '/var/up2date/cadata-install' removed directory: `/var/up2date/cadata-install' removing '/var/up2date/clvbrowser-install' removed directory: `/var/up2date/clvbrowser-install' removing '/var/up2date/geoip-install' removed directory: `/var/up2date/geoip-install' removing '/var/up2date/man9-install' removed directory: `/var/up2date/man9-install' removing '/var/up2date/ohelp9-install' removed directory: `/var/up2date/ohelp9-install' removing '/var/up2date/sys-install' removed directory: `/var/up2date/sys-install' Starting Up2Date Package Installer No suitable packages of type <man9> found, skipping No suitable packages of type <aws> found, skipping No suitable packages of type <clvbrowser> found, skipping No suitable packages of type <appctrl43> found, skipping No suitable packages of type <ohelp9> found, skipping No suitable packages of type <aptp> found, skipping No suitable packages of type <cadata> found, skipping No suitable packages of type <geoip> found, skipping Install u2d packages <sys> Starting installing up2date packages for type 'sys' Installing up2date package: /var/up2date/sys/u2d-sys-9.711005-712012.tgz.gpg Verifying up2date package signature Unpacking installation instructions parsing installation instructions .... Installing rpm package: ep-webadmin-contentmanager-9.70-64.g56528fb.rb2.i686.rpm OK Installing rpm package: chroot-reverseproxy-2.4.54-0.gdfdca5f.rb2.i686.rpm OK Installing rpm package: ep-httpproxy-9.70-290.g6e88177f.rb3.i686.rpm OK Installing rpm package: kernel-smp64-3.12.74-0.424574463.ge309b77.rb7.x86_64.rpm OK Installing rpm package: ep-release-9.712-12.noarch.rpm OK .... New system version: 9.712012 Up2Date Package Installer finished, exiting Initiating reboot Broadcast message from root (pts/0) (Wed Sep 28 13:15:51 2022): The system is going down for reboot NOW!
now your ssh session should stop and you should be back on the master nodeyou could watch the process from the CLI log or go back to the GUI
w-01:/home/login # tail -f /var/log/high-availability.log 2022:09:28-13:16:34 fw-01-2 ha_daemon[5014]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 45 34.586" name="Netlink: Found link beat on eth16 again!" 2022:09:28-13:16:25 fw-01-2 conntrack-tools[5505]: no dedicated links available!<27>Sep 28 13:16:34 conntrack-tools[5505]: no dedicated links available! 2022:09:28-13:16:34 fw-01-2 ha_daemon[5014]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 46 34.879" name="Netlink: Found link beat on eth3 again!" 2022:09:28-13:16:35 fw-01-2 ha_daemon[5014]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 47 35.586" name="Netlink: Lost link beat on eth16!" 2022:09:28-13:16:35 fw-01-2 ha_daemon[5014]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 48 35.586" name="Netlink: Lost link beat on eth3!" 2022:09:28-13:16:35 fw-01-2 conntrack-tools[5505]: no dedicated links available! 2022:09:28-13:16:37 fw-01-2 conntrack-tools[5505]: no dedicated links available! 2022:09:28-13:16:37 fw-01-2 ha_daemon[5014]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 49 37.223" name="Netlink: Found link beat on eth16 again!" 2022:09:28-13:16:39 fw-01-2 conntrack-tools[5505]: no dedicated links available! 2022:09:28-13:16:39 fw-01-2 ha_daemon[5014]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 50 39.614" name="Netlink: Lost link beat on eth16!" 2022:09:28-13:16:44 fw-01-2 ha_daemon[5014]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 51 44.175" name="Netlink: Found link beat on eth3 again!" 2022:09:28-13:16:45 fw-01-2 ha_daemon[5014]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 52 45.703" name="Netlink: Found link beat on eth16 again!" eventually you should see something like this: 2022:09:28-13:18:21 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 43 21.924" name="HA control: cmd = 'build'" 2022:09:28-13:18:21 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 44 21.926" name="HA control: cmd = 'up2date successful'" 2022:09:28-13:18:21 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 45 21.926" name="Set UTM version to 9.712012 2022:09:28-13:18:21 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 46 21.926" name="up2date to 9.712012 successful" 2022:09:28-13:18:21 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 47 21.926" name="start/reset initial synchronization timer = 300" 2022:09:28-13:18:21 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 48 21.926" name="state change UP2DATE(256) -> UP2DATE(258)" 2022:09:28-13:18:21 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 49 21.926" name="state change UP2DATE(258) -> SYNCING(2)" 2022:09:28-13:18:21 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 50 21.926" name="Executing (nowait) /etc/init.d/ha_mode enable" 2022:09:28-13:18:21 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 51 21.926" name="--- Node is enabled ---" 2022:09:28-13:18:21 fw-01-1 ha_mode[8063]: calling enable 2022:09:28-13:18:21 fw-01-1 ha_mode[8063]: enable: waiting for last ha_mode done 2022:09:28-13:18:21 fw-01-1 ha_mode[8063]: Switching enable mode 2022:09:28-13:18:22 fw-01-1 ha_mode[8063]: repctl[8096]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2022:09:28-13:18:22 fw-01-1 repctl[8096]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2022:09:28-13:18:22 fw-01-1 ha_mode[8063]: enable done (started at 13:18:21) 2022:09:28-13:18:22 fw-01-2 ha_daemon[5014]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 60 22.095" name="Node 1 changed version! 9.711005 -> 9.712012" 2022:09:28-13:18:22 fw-01-2 ha_daemon[5014]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 61 22.096" name="Node 1 changed state: UP2DATE(256) -> SYNCING(2)" 2022:09:28-13:18:22 fw-01-2 ha_daemon[5014]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 62 22.096" name="Executing (nowait) /etc/init.d/ha_mode topology_changed" 2022:09:28-13:18:22 fw-01-2 ha_mode[29359]: calling topology_changed 2022:09:28-13:18:22 fw-01-2 ha_mode[29359]: topology_changed: waiting for last ha_mode done 2022:09:28-13:18:22 fw-01-1 repctl[8096]: [i] execute(1768): pg_ctl: server is running (PID: 5142) 2022:09:28-13:18:22 fw-01-1 repctl[8096]: [i] execute(1768): /usr/pgsql92-64/bin/postgres "-D" "/var/storage/pgsql92/data" 2022:09:28-13:18:22 fw-01-2 ha_mode[29359]: repctl[29376]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2022:09:28-13:18:22 fw-01-2 repctl[29376]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2022:09:28-13:18:22 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 52 22.355" name="HA control: cmd = 'sync start 2 database'" 2022:09:28-13:18:22 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 53 22.355" name="Activating sync process for database on node 2" 2022:09:28-13:18:22 fw-01-1 repctl[8096]: [i] execute(1768): waiting for server to shut down... 2022:09:28-13:18:22 fw-01-1 repctl[8096]: [i] execute(1768): . 2022:09:28-13:18:22 fw-01-2 ha_mode[29359]: topology_changed done (started at 13:18:22) 2022:09:28-13:18:22 fw-01-2 repctl[29376]: [i] execute(1768): pg_ctl: server is running (PID: 5170) 2022:09:28-13:18:22 fw-01-2 repctl[29376]: [i] execute(1768): /usr/pgsql92-64/bin/postgres "-D" "/var/storage/pgsql92/data" 2022:09:28-13:18:22 fw-01-2 repctl[29376]: [i] execute(1768): pg_ctl: server is running (PID: 5170) 2022:09:28-13:18:22 fw-01-2 repctl[29376]: [i] execute(1768): /usr/pgsql92-64/bin/postgres "-D" "/var/storage/pgsql92/data" 2022:09:28-13:18:22 fw-01-2 repctl[29376]: [i] setup_replication(278): checkinterval 300 2022:09:28-13:18:23 fw-01-1 repctl[8096]: [i] execute(1768): done 2022:09:28-13:18:23 fw-01-1 repctl[8096]: [i] execute(1768): server stopped 2022:09:28-13:18:25 fw-01-1 repctl[8096]: [i] start_backup_mode(744): starting backup mode at 00000001000006A600000025 2022:09:28-13:18:25 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 54 25.281" name="HA control: cmd = 'sync start 2 database'" 2022:09:28-13:18:25 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 55 25.281" name="Activating sync process for database on node 2" 2022:09:28-13:18:27 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 56 27.562" name="Monitoring interfaces for link beat: lag0 eth17" 2022:09:28-13:18:28 fw-01-2 ha_daemon[5014]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 63 28.026" name="Set syncing.files for node 1" 2022:09:28-13:18:44 fw-01-2 ha_daemon[5014]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 64 44.091" name="Clear syncing.files for node 1" 2022:09:28-13:18:45 fw-01-1 ha_daemon[5025]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 57 45.950" name="Monitoring interfaces for link beat: lag0 eth17"
In my case the node1 stuck in "syncing" maybe i did not wait long enough, but i decided to rebuilt the db on node1 todo a fresh resync.
DONT DO THIS IF YOUR CUSTER IS OKAY AT THIS POINT
On node1 (slave node) fw-01:/var/up2date/sys # killall repctl fw-01:/var/up2date/sys # /etc/init.d/postgresql92 rebuild Rebuilding PostgreSQL database, all reporting data will be lost! Enter "yes" to continue... yes :: Stopping PostgreSQLpg_ctl: PID file "/var/storage/pgsql92/data/postmaster.pid" does not exist Is server running? d :: Initializing the PostgreSQL database e :: Starting PostgreSQL done :: Restarting SMTP Proxy :: Stopping SMTP Proxy [ ok ] :: Starting SMTP Proxy [ ok ] [ ok ] fw-01:/var/up2date/sys # /usr/local/bin/repctl repctl[14832]: [i] daemonize_check(1480): daemonized, see syslog for further messages it took around 45min to sync the complete logs (around 80gb) but eventually the cluster was acitve/standby again
Hi Samuel
I have exactly the same issue with a Cluster at one of our Customers.
I opened a case with Sophos asking for the u2d-sys-9.711005-712012.tgz.gpg file.
As soon as i get a response i will get back to you.
Regards,
Michael
thanks for your feedback.
i too opened a case this morning, but did not get a response yet.
i'll also keep you updated.
if you keep without success ... i have downloaded the file.
Dirk
Systema Gesellschaft für angewandte Datentechnik mbH // Sophos Platinum PartnerSophos Solution Partner since 2003 If a post solves your question, click the 'Verify Answer' link at this post.
Exact same problem here with HA A/S SG230 units. Working with support on it.
I would appreciate if you could share the file with us.
I have a case open but it's really frustrating if I provide detail error description and all the support ask me is none sense like "is HA enabled" "please enable remote access"
We have the same problem here. I am also appreciate if you can share the file.
Exact same issue here on a SG310 cluster :(
i tried to upload the file and initiate the up2date using the <upgrade node> button and rebooting the slave too ... without success.
i send you a PM ... if possible.
Hi Dirk could you send mit the file also? I have not even got a response from Sophos until now.
Hi Dirk we have the same problem it would be nice if you can send me the update file too.