This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Up2Date HA-System SYNCING Problem

Hi,

today i started an update on an active-passiv System to SW 9.106-17. The first node was upgraded successfully and took over all the traffic.

The second node was in reserved for upgrade state. So i startet also the upgrade on the second node. HA-Status is showing now SYNCING since the upgrade, and in HA-Livelog following messages are repeating periodically:

2013:10:20-20:37:20 primary-1 repctld[10485]: [e] do_monitor(1540): cannot get local database status

2013:10:20-20:38:20 primary-1 repctld[10485]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:20-20:38:20 primary-1 repctld[10485]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:20-20:38:20 primary-1 repctld[10485]: [e] do_monitor(1540): cannot get local database status

2013:10:20-20:39:20 primary-1 repctld[10485]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:20-20:39:20 primary-1 repctld[10485]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:20-20:39:20 primary-1 repctld[10485]: [e] do_monitor(1540): cannot get local database Status

Please can you take a look and point me to a solution, to get the second node back to slave working state?

Thanks in advance,
regards
Herbert

This thread was automatically locked due to age.

0 BAlfson over 11 years ago

Herbert, there's a known bug (27295) that was discovered in 9.104 when you specify a 'Preferred Master'. Set that to "None" until you have Up2Dated to 9.105 or later. Was that your issue?

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 kloana over 11 years ago in reply to BAlfson

Hi,

i did the upgrade to 9.106 directly from 9.105 Version, but anyway, i also changed the Option "prefered master" to None and did a reboot of the SLAVE Maschine.

Still the same in HA live log:
2013:10:20-21:36:52 primary-1 repctl[3923]: [w] recheck(1253): re-initialising replication

2013:10:20-21:36:52 primary-1 repctl[3923]:  execute(2324): pg_ctl: no server running

2013:10:20-21:36:52 primary-1 repctl[3923]:  execute(2324): pg_ctl: PID file "/var/storage/pgsql92/data/postmaster.pid" does not exist

2013:10:20-21:36:53 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:54 primary-1 repctl[3923]:  start_backup_mode(883): starting backup mode at 000000010000000C00000002

2013:10:20-21:36:54 primary-1 repctl[3923]:  execute(2324): rsync: failed to connect to 198.19.250.2: Connection refused (111)

2013:10:20-21:36:54 primary-1 repctl[3923]: [c] standby_clone(1065): rsync failed on $VAR1 = {

2013:10:20-21:36:55 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:55 primary-1 repctl[3923]:  stop_backup_mode(904): stopped backup mode at 000000010000000C00000002

2013:10:20-21:36:55 primary-1 repctl[3923]: [c] standby_clone(1077): sync aborted

2013:10:20-21:36:55 primary-1 repctl[3923]: [e] prepare_secondary(579): clone failed

2013:10:20-21:36:55 primary-1 repctld[5515]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:20-21:36:55 primary-1 repctld[5515]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:20-21:36:55 primary-1 repctld[5515]: [e] do_monitor(1540): cannot get local database status

2013:10:20-21:36:55 primary-1 repctl[3923]:  start_backup_mode(883): starting backup mode at 000000010000000C00000004

2013:10:20-21:36:55 primary-1 repctl[3923]:  execute(2324): rsync: failed to connect to 198.19.250.2: Connection refused (111)

2013:10:20-21:36:55 primary-1 repctl[3923]: [c] standby_clone(1065): rsync failed on $VAR1 = {

2013:10:20-21:36:56 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:57 primary-1 repctl[3923]:  stop_backup_mode(904): stopped backup mode at 000000010000000C00000004

2013:10:20-21:36:57 primary-1 repctl[3923]: [c] standby_clone(1077): sync aborted

2013:10:20-21:36:57 primary-1 repctl[3923]: [e] prepare_secondary(579): clone failed

2013:10:20-21:36:57 primary-1 repctl[3923]:  start_backup_mode(883): starting backup mode at 000000010000000C00000005

2013:10:20-21:36:57 primary-1 repctl[3923]:  execute(2324): rsync: failed to connect to 198.19.250.2: Connection refused (111)

2013:10:20-21:36:57 primary-1 repctl[3923]: [c] standby_clone(1065): rsync failed on $VAR1 = {

2013:10:20-21:36:58 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:58 primary-1 repctl[3923]:  stop_backup_mode(904): stopped backup mode at 000000010000000C00000005

2013:10:20-21:36:58 primary-1 repctl[3923]: [c] standby_clone(1077): sync aborted

2013:10:20-21:36:58 primary-1 repctl[3923]: [e] prepare_secondary(579): clone failed

2013:10:20-21:36:58 primary-1 repctl[3923]: [c] prepare_secondary(591): failed to get database up, waiting for retry

2013:10:20-21:36:58 primary-1 repctl[3923]: [e] start_monitor(1441): refusing to start second monitor process

2013:10:20-21:36:58 primary-1 repctl[3923]:  setup_replication(233): checkinterval 300

Thanks and Regards
Herbert
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 11 years ago

Yeah, I think that now you're stuck with taking the node off-line and re-imaging it from ISO. You might be able to just do a factory reset on that node - I haven't tried it, so I'd be interested to know if that would work.

By the way, why did you set a preferred node?

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 kloana over 11 years ago in reply to BAlfson

Hi,

i don't know why i set this preferred node during installation?!?!

OK i will give it a try, to reset the node to factory and start syncing again. But i can do this not before weekend, because currently i only have management access vie WEB and not locally,....

Thanks
regards
Herbert
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 11 years ago

You can force a factory reset in WebAdmin. Just set the High Availability Operation Mode to "Off", wait a minute, and then reactivate it.

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 kloana over 11 years ago in reply to BAlfson

Hi,

how can i do this via remote?

When i set ha to None, how is it possible to set the second Maschine to factory reset, or how can i connect to the factory resetted Maschine via remote?

Thanks,
regards
Herbert
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 11 years ago

The second machine is factory reset by turning off HA in the Master. I think the Slave will automatically rejoin HA when you enable HA again in the Master. At the very most, someone at the site would need to power-cycle the Slave after the reset finishes.

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 kloana over 11 years ago in reply to BAlfson

Hi,

this weekend i tried both ways. Reset slave to factory and took it back to HA. This was not working.
After this i tried new installation, and i have still the same Problem. Both FW's were installed from ISO. Then i used a backupfile to restore Setting, but the Problem still exists.

2013:10:27-18:33:19 primary-2 repctld[597]: [e] do_monitor(1540): cannot get local database status

2013:10:27-18:34:19 primary-2 repctld[597]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:27-18:34:19 primary-2 repctld[597]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:27-18:34:19 primary-2 repctld[597]: [e] do_monitor(1540): cannot get local database status

2013:10:27-18:35:19 primary-2 repctld[597]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:27-18:35:19 primary-2 repctld[597]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:27-18:35:19 primary-2 repctld[597]: [e] do_monitor(1540): cannot get local database status

2013:10:27-18:36:19 primary-2 repctld[597]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:27-18:36:19 primary-2 repctld[597]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:27-18:36:19 primary-2 repctld[597]: [e] do_monitor(1540): cannot get local database status

Maybe you have some tips,...

Thanks,
regards
Herbert
Cancel
Vote Up 0 Vote Down

Cancel
0 kloana over 11 years ago in reply to kloana

Hi,

has someone any ideas,... Otherwise i will go back to old firmware via ISO and new installation.

Thanks,
regards
Herbert
Cancel
Vote Up 0 Vote Down

Cancel
0 kloana over 10 years ago

Hi,

now i did a up2date to 9.2 BETA and HA is working again without any troubles,...

regards
Herbert
Cancel
Vote Up 0 Vote Down

Cancel