Hello,
we have two SG550 UTM 9 Firewalls in an active-passive mode. They had the Version 9.713-19 and we wanted to upgrade to 9.714-4.
Once we upgraded they have been in an infinite Syncing loop.
This is what the high-availability logs say :
2023:02:01-09:36:33 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 996 33.481" name="Clear syncing.files for node 2" 2023:02:01-09:36:42 firewall-2 repctl[19331]: [i] stop_backup_mode(765): stopped backup mode at 000000010000001F000000D4 2023:02:01-09:36:42 firewall-2 repctl[19331]: [c] standby_clone(950): standby_clone failed: sync aborted (never executed successfully) 2023:02:01-09:36:42 firewall-2 repctl[19331]: [e] prepare_secondary(346): prepare_secondary: clone failed 2023:02:01-09:36:42 firewall-2 repctl[19331]: [c] prepare_secondary(360): failed to get database up, waiting for retry 2023:02:01-09:36:42 firewall-2 repctl[19331]: [c] setup_replication(274): setup_replication was not properly executed 2023:02:01-09:36:42 firewall-2 repctl[19331]: [i] setup_replication(278): checkinterval 300 2023:02:01-09:41:25 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 997 25.126" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip IP_1 slave_ip IP_2" 2023:02:01-09:41:25 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 998 25.815" name="Executing (nowait) /etc/init.d/ha_mode check" 2023:02:01-09:41:25 firewall-1 ha_mode[26575]: calling check 2023:02:01-09:41:25 firewall-1 ha_mode[26575]: check: waiting for last ha_mode done 2023:02:01-09:41:25 firewall-1 ha_mode[26575]: check_ha() role=MASTER, status=ACTIVE 2023:02:01-09:41:26 firewall-1 ha_mode[26575]: check done (started at 09:41:25) 2023:02:01-09:41:31 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 999 31.126" name="Set syncing.files for node 2" 2023:02:01-09:41:33 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 1000 33.469" name="Clear syncing.files for node 2" 2023:02:01-09:41:42 firewall-2 repctl[19331]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 0 2023:02:01-09:41:42 firewall-2 repctl[19331]: [i] execute(1768): pg_ctl: no server running 2023:02:01-09:41:43 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 945 43.944" name="HA control: cmd = 'sync start 1 database'" 2023:02:01-09:41:43 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 946 43.944" name="Activating sync process for database on node 1" 2023:02:01-09:41:43 firewall-2 repctl[19331]: [i] execute(1768): pg_ctl: PID file "/var/storage/pgsql92/data/postmaster.pid" does not exist 2023:02:01-09:41:43 firewall-2 repctl[19331]: [i] execute(1768): Is server running? 2023:02:01-09:41:45 firewall-2 repctl[19331]: [i] start_backup_mode(744): starting backup mode at 000000010000001F000000DC 2023:02:01-09:41:45 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 947 45.934" name="HA control: cmd = 'sync start 1 database'" 2023:02:01-09:41:45 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 948 45.934" name="Activating sync process for database on node 1" 2023:02:01-09:41:46 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:41:46 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:1) 2023:02:01-09:41:56 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:41:56 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:2) 2023:02:01-09:42:06 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:42:06 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:3) 2023:02:01-09:42:16 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:42:16 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:4) 2023:02:01-09:42:26 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:42:26 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:5) 2023:02:01-09:42:36 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:42:36 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:6) 2023:02:01-09:42:46 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:42:46 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:7) 2023:02:01-09:42:56 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:42:56 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:8) 2023:02:01-09:43:06 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:43:06 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:9) 2023:02:01-09:43:16 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:43:16 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4] 2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = { 2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default', 2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/', 2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default' 2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): }; 2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:10) 2023:02:01-09:43:28 firewall-2 repctl[19331]: [i] stop_backup_mode(765): stopped backup mode at 000000010000001F000000DD 2023:02:01-09:43:28 firewall-2 repctl[19331]: [c] standby_clone(950): standby_clone failed: sync aborted (never executed successfully) 2023:02:01-09:43:28 firewall-2 repctl[19331]: [e] prepare_secondary(346): prepare_secondary: clone failed 2023:02:01-09:43:30 firewall-2 repctl[19331]: [i] start_backup_mode(744): starting backup mode at 000000010000001F000000DF 2023:02:01-09:43:30 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 949 30.161" name="HA control: cmd = 'sync start 1 database'" 2023:02:01-09:43:30 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 950 30.161" name="Activating sync process for database on node 1" 2023:02:01-09:43:30 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111) 2023:02:01-09:43:30 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
Right after the upgrade the Master had problems with E-Mail Protection, it was not working at all. I restarted it and the second Firewall was set as Master and the first one as Slave. Then this Problem persisted until the weekend.
Then on the weekend I rebuilt the postgresql92 on both the machines and restarted them one by one.
This has only reversed the roles but did not solve the Problem ( before the second one was the Master and the first one infinite Syncing Slave, now its the first one the Master and the second one is infinitely Syncing ). I had to restart one or the other to come to the same situation where High-availability is not working but at least everything else is.
Currently everything is running correctly on the Master. Interestingly enough everything was working on the broken Firewall after the upgrade, except for E-Mail protection. No mails would be sent or received by the Firewall and when the Mail Manager would be opened it would not load any Tabs, even though they are there and I could switch between them.
Any Idea how to solve this ?
This thread was automatically locked due to age.