This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HA stuck in Syncing

Hi,

since the last server-crash (our utm is a virtual machine) we got this error messages in ha-log and the state is "Syncing" since three days

repctl[4062]: [e] db_connect(2058): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.2): could not connect to server: Connection refused

repctl[4062]: [e] master_connection(1904): could not connect to server: Connection refused

repctl[4062]: [e] db_connect(2058): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.2): could not connect to server: Connection refused

repctl[4062]: [e] master_connection(1904): could not connect to server: Connection refused

repctl[4062]: [e] db_connect(2058): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.2): could not connect to server: Connection refused

repctl[4062]: [e] master_connection(1904): could not connect to server: Connection refused

repctl[4062]: [e] db_connect(2058): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.2): could not connect to server: Connection refused

repctl[4062]: [e] master_connection(1904): could not connect to server: Connection refused

repctl[4062]: [i] execute(1627): pg_ctl: could not send stop signal (PID: 6256): No such process

repctl[4062]: [i] recover_master(2296): Using previous master 198.19.250.1 for recovery

repctl[4062]: [i] recover_master(2329): Testing SLAVE/WORKER nodes for rsyncd

repctl[4062]: [c] hasyncmsg(1468): this is a primary node

repctl[4062]: [i] recover_master(2402): MASTER: syncing folder /global/pg_control from 198.19.250.1

repctl[4062]: [i] execute(1627): rsync: failed to connect to 198.19.250.1: Connection refused (111)

repctl[4062]: [c] recover_master(2419): rsync failed on $VAR1 = {

repctl[4062]: [c] recover_master(2428): sync aborted

is there a way for me to fix it without reinstalling the firewall?

kind regards

This thread was automatically locked due to age.

0 itsfruity over 8 years ago

Yeah, there is but i can't remember the commands off the top of my head.

I'll have to send them through on Monday. Do you have a Support contract? If so, get a call logged and the engineers will help you out.
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 8 years ago

Rudolf, have you tried rebooting the Slave? If that doesn't work, try disabling HA, thereby forcing the Slave to do a Factory Reset, and then re-establish Hot-Standby. Any luck with that?

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 papa_ over 8 years ago

I had same problem after some power failures.

- shut down slave

- reboot master (you will have a downtime)

- power on slave
Cancel
Vote Up 0 Vote Down

Cancel
0 itsfruity over 8 years ago

You'll loose reporting data but:

Step 1: login to master node, su to root
Step 2: open a new ssh window, login to master again, su to root
Step 3: on 2nd window, enter: ha_utils ssh
Step 4: in the 2nd window, login to slave as loginuser, then su to root
Step 5: on both ssh windows, enter: killall repctl
Step 6: on both ssh windows, enter: /etc/init.d/postgresql92 rebuild
Step 7: after database rebuilds, enter on both ssh windows: repctl
Cancel
Vote Up 0 Vote Down

Cancel
0 Cheerok over 7 years ago in reply to itsfruity

Hello,

as a member of the sophos Support i have to address very important points about this "tutorial":

1. Database rebuilds are only to be done by a certain support level and only by Sophos Support Members.

2. Rebuilding a Database without support recommendation on your own, "voids" your support until you re-image

the machine.

3. There are several unpredictable problems/errors that could occur after rebuilding a database the wrong way

or making mistakes that we can't fix or rather wont fix as per point 2, as this is the central database for

most UTM services!

4. This is not the correct way to rebuild a database and many times not even necessary. For analysis if this is even needed,

you can contact Support or your Partner / Distributor.

Regards,

Cheerok
Cancel
Vote Up 0 Vote Down

Cancel
0 Adrien Belcourt1 over 5 years ago in reply to itsfruity
Danny,

Thank you for this post. Just had to use the information in it - again.

Worth adding that

<M> utm:/home/login # tail -f /var/log/system.log

2019:05:02-15:20:12 utm-2 ulogd[7753]: pg1: connect: could not connect to server: No such file or directory
2019:05:02-15:20:17 utm-2 ulogd[7753]: pg1: connect: could not connect to server: No such file or directory
2019:05:02-15:20:22 utm-2 ulogd[7753]: pg1: connect: could not connect to server: No such file or directory

^C
<M> utm:/home/login # telnet 127.0.0.1 5432

Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused

Is a good indication that the database is corrupt.

And to Cheerok's points I think it would be reasonable to add

Any customer has the right and frequently the need to access the root shell.

Any change at root always has the *possibility* of voiding support.

Nearly all customers have to use root access to effectively administer their UTMs.

If you administer enough UTMs, that database rebuilds is a common need.

So as a customer you have to be careful. Be well informed. Contact support where possible. And when not possible - tread carefully.

So it is not in place to prevent customers from using root, only to dissuade people from doing "stupid stuff" like trying to install device drivers using root access - which actually happened back in ASG V4 days and resulted in the root changes may void support rule.

All the best,

Adrien.
Cancel
Vote Up 0 Vote Down

Cancel
0 Efren Gonzalez over 5 years ago in reply to BAlfson

Hi Bob,

I know this is old, but maybe you can please clarify. Is the Slave you mention the same as the Swarm instance? Is the HA disabling the ha_aws process? I believe I am not using ha, but somehow have the same issue.
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 5 years ago in reply to Efren Gonzalez

Hi Efren and welcome to the UTM Community!

There are too many unknowns for me to make any suggestions. I recommend you get a case open with Sophos Support. When you get your issue resolved, please come back here and describe the solution.

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel