This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Up2Date HA-System SYNCING Problem

Hi,

today i started an update on an active-passiv System to SW 9.106-17. The first node was upgraded successfully and took over all the traffic.

The second node was in reserved for upgrade state. So i startet also the upgrade on the second node. HA-Status is showing now SYNCING since the upgrade, and in HA-Livelog following messages are repeating periodically:

2013:10:20-20:37:20 primary-1 repctld[10485]: [e] do_monitor(1540): cannot get local database status

2013:10:20-20:38:20 primary-1 repctld[10485]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:20-20:38:20 primary-1 repctld[10485]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:20-20:38:20 primary-1 repctld[10485]: [e] do_monitor(1540): cannot get local database status

2013:10:20-20:39:20 primary-1 repctld[10485]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:20-20:39:20 primary-1 repctld[10485]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:20-20:39:20 primary-1 repctld[10485]: [e] do_monitor(1540): cannot get local database Status

Please can you take a look and point me to a solution, to get the second node back to slave working state?

Thanks in advance,
regards
Herbert

This thread was automatically locked due to age.

Parents

0 BAlfson over 11 years ago

Herbert, there's a known bug (27295) that was discovered in 9.104 when you specify a 'Preferred Master'. Set that to "None" until you have Up2Dated to 9.105 or later. Was that your issue?

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 kloana over 11 years ago in reply to BAlfson

Hi,

i did the upgrade to 9.106 directly from 9.105 Version, but anyway, i also changed the Option "prefered master" to None and did a reboot of the SLAVE Maschine.

Still the same in HA live log:
2013:10:20-21:36:52 primary-1 repctl[3923]: [w] recheck(1253): re-initialising replication

2013:10:20-21:36:52 primary-1 repctl[3923]:  execute(2324): pg_ctl: no server running

2013:10:20-21:36:52 primary-1 repctl[3923]:  execute(2324): pg_ctl: PID file "/var/storage/pgsql92/data/postmaster.pid" does not exist

2013:10:20-21:36:53 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:54 primary-1 repctl[3923]:  start_backup_mode(883): starting backup mode at 000000010000000C00000002

2013:10:20-21:36:54 primary-1 repctl[3923]:  execute(2324): rsync: failed to connect to 198.19.250.2: Connection refused (111)

2013:10:20-21:36:54 primary-1 repctl[3923]: [c] standby_clone(1065): rsync failed on $VAR1 = {

2013:10:20-21:36:55 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:55 primary-1 repctl[3923]:  stop_backup_mode(904): stopped backup mode at 000000010000000C00000002

2013:10:20-21:36:55 primary-1 repctl[3923]: [c] standby_clone(1077): sync aborted

2013:10:20-21:36:55 primary-1 repctl[3923]: [e] prepare_secondary(579): clone failed

2013:10:20-21:36:55 primary-1 repctld[5515]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:20-21:36:55 primary-1 repctld[5515]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:20-21:36:55 primary-1 repctld[5515]: [e] do_monitor(1540): cannot get local database status

2013:10:20-21:36:55 primary-1 repctl[3923]:  start_backup_mode(883): starting backup mode at 000000010000000C00000004

2013:10:20-21:36:55 primary-1 repctl[3923]:  execute(2324): rsync: failed to connect to 198.19.250.2: Connection refused (111)

2013:10:20-21:36:55 primary-1 repctl[3923]: [c] standby_clone(1065): rsync failed on $VAR1 = {

2013:10:20-21:36:56 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:57 primary-1 repctl[3923]:  stop_backup_mode(904): stopped backup mode at 000000010000000C00000004

2013:10:20-21:36:57 primary-1 repctl[3923]: [c] standby_clone(1077): sync aborted

2013:10:20-21:36:57 primary-1 repctl[3923]: [e] prepare_secondary(579): clone failed

2013:10:20-21:36:57 primary-1 repctl[3923]:  start_backup_mode(883): starting backup mode at 000000010000000C00000005

2013:10:20-21:36:57 primary-1 repctl[3923]:  execute(2324): rsync: failed to connect to 198.19.250.2: Connection refused (111)

2013:10:20-21:36:57 primary-1 repctl[3923]: [c] standby_clone(1065): rsync failed on $VAR1 = {

2013:10:20-21:36:58 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:58 primary-1 repctl[3923]:  stop_backup_mode(904): stopped backup mode at 000000010000000C00000005

2013:10:20-21:36:58 primary-1 repctl[3923]: [c] standby_clone(1077): sync aborted

2013:10:20-21:36:58 primary-1 repctl[3923]: [e] prepare_secondary(579): clone failed

2013:10:20-21:36:58 primary-1 repctl[3923]: [c] prepare_secondary(591): failed to get database up, waiting for retry

2013:10:20-21:36:58 primary-1 repctl[3923]: [e] start_monitor(1441): refusing to start second monitor process

2013:10:20-21:36:58 primary-1 repctl[3923]:  setup_replication(233): checkinterval 300

Thanks and Regards
Herbert
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 kloana over 11 years ago in reply to BAlfson

Hi,

i did the upgrade to 9.106 directly from 9.105 Version, but anyway, i also changed the Option "prefered master" to None and did a reboot of the SLAVE Maschine.

Still the same in HA live log:
2013:10:20-21:36:52 primary-1 repctl[3923]: [w] recheck(1253): re-initialising replication

2013:10:20-21:36:52 primary-1 repctl[3923]:  execute(2324): pg_ctl: no server running

2013:10:20-21:36:52 primary-1 repctl[3923]:  execute(2324): pg_ctl: PID file "/var/storage/pgsql92/data/postmaster.pid" does not exist

2013:10:20-21:36:53 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:54 primary-1 repctl[3923]:  start_backup_mode(883): starting backup mode at 000000010000000C00000002

2013:10:20-21:36:54 primary-1 repctl[3923]:  execute(2324): rsync: failed to connect to 198.19.250.2: Connection refused (111)

2013:10:20-21:36:54 primary-1 repctl[3923]: [c] standby_clone(1065): rsync failed on $VAR1 = {

2013:10:20-21:36:55 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:55 primary-1 repctl[3923]:  stop_backup_mode(904): stopped backup mode at 000000010000000C00000002

2013:10:20-21:36:55 primary-1 repctl[3923]: [c] standby_clone(1077): sync aborted

2013:10:20-21:36:55 primary-1 repctl[3923]: [e] prepare_secondary(579): clone failed

2013:10:20-21:36:55 primary-1 repctld[5515]: [e] db_connect(2697): error while connecting to database: could not connect to server: No such file or directory

2013:10:20-21:36:55 primary-1 repctld[5515]: [c] local_connection(2643): cannot connect to local database: could not connect to server: No such file or directory

2013:10:20-21:36:55 primary-1 repctld[5515]: [e] do_monitor(1540): cannot get local database status

2013:10:20-21:36:55 primary-1 repctl[3923]:  start_backup_mode(883): starting backup mode at 000000010000000C00000004

2013:10:20-21:36:55 primary-1 repctl[3923]:  execute(2324): rsync: failed to connect to 198.19.250.2: Connection refused (111)

2013:10:20-21:36:55 primary-1 repctl[3923]: [c] standby_clone(1065): rsync failed on $VAR1 = {

2013:10:20-21:36:56 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:57 primary-1 repctl[3923]:  stop_backup_mode(904): stopped backup mode at 000000010000000C00000004

2013:10:20-21:36:57 primary-1 repctl[3923]: [c] standby_clone(1077): sync aborted

2013:10:20-21:36:57 primary-1 repctl[3923]: [e] prepare_secondary(579): clone failed

2013:10:20-21:36:57 primary-1 repctl[3923]:  start_backup_mode(883): starting backup mode at 000000010000000C00000005

2013:10:20-21:36:57 primary-1 repctl[3923]:  execute(2324): rsync: failed to connect to 198.19.250.2: Connection refused (111)

2013:10:20-21:36:57 primary-1 repctl[3923]: [c] standby_clone(1065): rsync failed on $VAR1 = {

2013:10:20-21:36:58 primary-2 ha_daemon[3815]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

2013:10:20-21:36:58 primary-1 repctl[3923]:  stop_backup_mode(904): stopped backup mode at 000000010000000C00000005

2013:10:20-21:36:58 primary-1 repctl[3923]: [c] standby_clone(1077): sync aborted

2013:10:20-21:36:58 primary-1 repctl[3923]: [e] prepare_secondary(579): clone failed

2013:10:20-21:36:58 primary-1 repctl[3923]: [c] prepare_secondary(591): failed to get database up, waiting for retry

2013:10:20-21:36:58 primary-1 repctl[3923]: [e] start_monitor(1441): refusing to start second monitor process

2013:10:20-21:36:58 primary-1 repctl[3923]:  setup_replication(233): checkinterval 300

Thanks and Regards
Herbert
Cancel
Vote Up 0 Vote Down

Cancel

Children

No Data