This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Syncing between Master and Slave not working after latest upgrade

Hello,

we have two SG550 UTM 9 Firewalls in an active-passive mode. They had the Version 9.713-19 and we wanted to upgrade to 9.714-4.

Once we upgraded they have been in an infinite Syncing loop.

This is what the high-availability logs say : 

2023:02:01-09:36:33 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 996 33.481" name="Clear syncing.files for node 2"
2023:02:01-09:36:42 firewall-2 repctl[19331]: [i] stop_backup_mode(765): stopped backup mode at 000000010000001F000000D4
2023:02:01-09:36:42 firewall-2 repctl[19331]: [c] standby_clone(950): standby_clone failed: sync aborted (never executed successfully)
2023:02:01-09:36:42 firewall-2 repctl[19331]: [e] prepare_secondary(346): prepare_secondary: clone failed
2023:02:01-09:36:42 firewall-2 repctl[19331]: [c] prepare_secondary(360): failed to get database up, waiting for retry
2023:02:01-09:36:42 firewall-2 repctl[19331]: [c] setup_replication(274): setup_replication was not properly executed
2023:02:01-09:36:42 firewall-2 repctl[19331]: [i] setup_replication(278): checkinterval 300
2023:02:01-09:41:25 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 997 25.126" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip IP_1 slave_ip IP_2"
2023:02:01-09:41:25 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 998 25.815" name="Executing (nowait) /etc/init.d/ha_mode check"
2023:02:01-09:41:25 firewall-1 ha_mode[26575]: calling check
2023:02:01-09:41:25 firewall-1 ha_mode[26575]: check: waiting for last ha_mode done
2023:02:01-09:41:25 firewall-1 ha_mode[26575]: check_ha() role=MASTER, status=ACTIVE
2023:02:01-09:41:26 firewall-1 ha_mode[26575]: check done (started at 09:41:25)
2023:02:01-09:41:31 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 999 31.126" name="Set syncing.files for node 2"
2023:02:01-09:41:33 firewall-1 ha_daemon[7053]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 1000 33.469" name="Clear syncing.files for node 2"
2023:02:01-09:41:42 firewall-2 repctl[19331]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 0
2023:02:01-09:41:42 firewall-2 repctl[19331]: [i] execute(1768): pg_ctl: no server running
2023:02:01-09:41:43 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 945 43.944" name="HA control: cmd = 'sync start 1 database'"
2023:02:01-09:41:43 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 946 43.944" name="Activating sync process for database on node 1"
2023:02:01-09:41:43 firewall-2 repctl[19331]: [i] execute(1768): pg_ctl: PID file "/var/storage/pgsql92/data/postmaster.pid" does not exist
2023:02:01-09:41:43 firewall-2 repctl[19331]: [i] execute(1768): Is server running?
2023:02:01-09:41:45 firewall-2 repctl[19331]: [i] start_backup_mode(744): starting backup mode at 000000010000001F000000DC
2023:02:01-09:41:45 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 947 45.934" name="HA control: cmd = 'sync start 1 database'"
2023:02:01-09:41:45 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 948 45.934" name="Activating sync process for database on node 1"
2023:02:01-09:41:46 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:41:46 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:41:46 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:1)
2023:02:01-09:41:56 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:41:56 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:41:56 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:2)
2023:02:01-09:42:06 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:42:06 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:42:06 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:3)
2023:02:01-09:42:16 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:42:16 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:42:16 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:4)
2023:02:01-09:42:26 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:42:26 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:42:26 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:5)
2023:02:01-09:42:36 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:42:36 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:42:36 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:6)
2023:02:01-09:42:46 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:42:46 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:42:46 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:7)
2023:02:01-09:42:56 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:42:56 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:42:56 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:8)
2023:02:01-09:43:06 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:43:06 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:43:06 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:9)
2023:02:01-09:43:16 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:43:16 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]
2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): rsync failed on $VAR1 = {
2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'path' => '/postgres.default',
2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'dst' => '/var/storage/pgsql92/',
2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): 'module' => 'postgres-default'
2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): };
2023:02:01-09:43:16 firewall-2 repctl[19331]: [c] standby_clone(936): (Attempt #:10)
2023:02:01-09:43:28 firewall-2 repctl[19331]: [i] stop_backup_mode(765): stopped backup mode at 000000010000001F000000DD
2023:02:01-09:43:28 firewall-2 repctl[19331]: [c] standby_clone(950): standby_clone failed: sync aborted (never executed successfully)
2023:02:01-09:43:28 firewall-2 repctl[19331]: [e] prepare_secondary(346): prepare_secondary: clone failed
2023:02:01-09:43:30 firewall-2 repctl[19331]: [i] start_backup_mode(744): starting backup mode at 000000010000001F000000DF
2023:02:01-09:43:30 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 949 30.161" name="HA control: cmd = 'sync start 1 database'"
2023:02:01-09:43:30 firewall-2 ha_daemon[7101]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 950 30.161" name="Activating sync process for database on node 1"
2023:02:01-09:43:30 firewall-2 repctl[19331]: [i] execute(1768): rsync: failed to connect to IP_1: Connection refused (111)
2023:02:01-09:43:30 firewall-2 repctl[19331]: [i] execute(1768): rsync error: error in socket IO (code 10) at clientserver.c(122) [receiver=3.0.4]

Right after the upgrade the Master had problems with E-Mail Protection, it was not working at all. I restarted it and the second Firewall was set as Master and the first one as Slave. Then this Problem persisted until the weekend.

Then on the weekend I rebuilt the postgresql92 on both the machines and restarted them one by one.

This has only reversed the roles but did not solve the Problem ( before the second one was the Master and the first one infinite Syncing Slave, now its the first one the Master and the second one is infinitely Syncing ). I had to restart one or the other to come to the same situation where High-availability is not working but at least everything else is. 

Currently everything is running correctly on the Master. Interestingly enough everything was working on the broken Firewall after the upgrade, except for E-Mail protection. No mails would be sent or received by the Firewall and when the Mail Manager would be opened it would not load any Tabs, even though they are there and I could switch between them.

Any Idea how to solve this ?



This thread was automatically locked due to age.
Parents
  • Hallo MikR,

    My only idea is to take the Slave offline so you can re-image it.  Remember that it must have the same version as the current Master before you connect the sync interfaces.

    Glück gehabt?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • How would I go about "re-imaging" it ? What If I disconnect it, do a factory reset and connect it back in, would it not offer me to update it from the Web Interface ? 

Reply Children
  • Vielleicht, MikR, but I prefer to be sure instead of fighting issues that could have been avoided, so the re-image is still my recommendation.

    Let me find the download link.  Do you have an external DVD device?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Yes we have an external DVD device. Couldn't I re-image it from the Web Interface ?

    Once I shutdown the current Master, the second Firewall becomes the new master and everything works, except for the E-Mail Protection.

    I will try to restore the non-working Firewall using a Backup made before the upgrade, then start the first one and tell the second one to do the upgrade. When that does not work I will see about re-imaging it.

  • You can try to unplug all LAN cables from Slave and do a factory reset on the slave. After factory reset - power off slave - plug all cables back and power on again and look if HA-Sync will be successful and if HA-cluster is working as expected...

  • Ye that actually makes sense. Will try that.