This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HA Replication Fehler

Hallo Zusammen,

habe heute meiner UTM eine Hot-Standby System eingerichtet. Jetzt steht der Sync-Vorgang schon seit über einer Stunde an. Beide UTMs haben die Version 9.209-8. Im HA Log konnte ich folgendes finden:

2014:11:13-19:43:25 rz-fw1-1 repctl[7847]: [e] trim_history(1773): delete failed
2014:11:13-19:48:25 rz-fw1-1 repctl[7847]: [c] sql_execute(2234): SQL execute: ERROR: schema "repmgr_asg" does not exist
2014:11:13-19:48:25 rz-fw1-1 repctl[7847]: [e] trim_history(1773): delete failed
2014:11:13-19:53:25 rz-fw1-1 repctl[7847]: [c] sql_execute(2234): SQL execute: ERROR: schema "repmgr_asg" does not exist
2014:11:13-19:53:25 rz-fw1-1 repctl[7847]: [e] trim_history(1773): delete failed
2014:11:13-19:58:25 rz-fw1-1 repctl[7847]: [c] sql_execute(2234): SQL execute: ERROR: schema "repmgr_asg" does not exist
2014:11:13-19:58:25 rz-fw1-1 repctl[7847]: [e] trim_history(1773): delete failed
2014:11:13-20:02:50 rz-fw1-1 ha_daemon[7227]: id="38A0" severity="info" sys="System" sub="ha" name="Reading cluster configuration"
2014:11:13-20:03:25 rz-fw1-1 repctl[7847]: [c] sql_execute(2234): SQL execute: ERROR: schema "repmgr_asg" does not exist
2014:11:13-20:03:25 rz-fw1-1 repctl[7847]: [e] trim_history(1773): delete failed
2014:11:13-20:03:54 rz-fw1-1 ha_daemon[7227]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth1 eth3 eth2 eth4 eth5 "
2014:11:13-20:08:25 rz-fw1-1 repctl[7847]: [c] sql_execute(2234): SQL execute: ERROR: schema "repmgr_asg" does not exist
2014:11:13-20:08:25 rz-fw1-1 repctl[7847]: [e] trim_history(1773): delete failed
2014:11:13-20:13:25 rz-fw1-1 repctl[7847]: [c] sql_execute(2234): SQL execute: ERROR: schema "repmgr_asg" does not exist
2014:11:13-20:13:25 rz-fw1-1 repctl[7847]: [e] trim_history(1773): delete failed
2014:11:13-20:18:25 rz-fw1-1 repctl[7847]: [c] sql_execute(2234): SQL execute: ERROR: schema "repmgr_asg" does not exist
2014:11:13-20:18:25 rz-fw1-1 repctl[7847]: [e] trim_history(1773): delete failed
2014:11:13-20:23:25 rz-fw1-1 repctl[7847]: [c] sql_execute(2234): SQL execute: ERROR: schema "repmgr_asg" does not exist
2014:11:13-20:23:25 rz-fw1-1 repctl[7847]: [e] trim_history(1773): delete failed

Habe es jetzt auch scho mal mit dem Befehl postgresql92 rebuild versucht. Der Sync dauert aber immer noch an. Hat jemand von euch eine Idee?

Vielen Dank schon mal... [;)]

This thread was automatically locked due to age.

0 GuyFawkes over 10 years ago

Hallo knokatorat93
erste Frage, warum hast du ein rebuild gemacht???
Wundert mich nicht, dass der Sync "weiterhin" nicht geht.

Slave einmal auf Werkseinstellungen zurücksetzen und neu hinzufügen (falls das nicht klappt, einmal neu installieren, ist ja schnell gemacht)?

Nice greetings
Cancel
Vote Up 0 Vote Down

Cancel

0 knokatorat93 over 10 years ago in reply to GuyFawkes

Hallo GuyFawkes,

hatte die Vermutung, dass es an der PostgreSQL liegen könnte. Das ist aber scheinbar nicht so. Nach dem Reset des Slaves wird er auch synchronisiert und steht jetzt auf Ready. Allerdings habe ich noch weitere Fehler im Log gefunden:

2014:11:14-07:06:53 rz-fw1-1 repctl[3874]: [e] trim_history(1756): cannot connect to database

2014:11:14-07:11:53 rz-fw1-1 repctl[3874]: [e] db_connect(2554): error while connecting to database: could not connect to server: Connection refused

2014:11:14-07:11:53 rz-fw1-1 repctl[3874]: [e] master_connection(2458): could not connect to server: Connection refused

2014:11:14-07:11:53 rz-fw1-1 repctl[3874]: [e] trim_history(1756): cannot connect to database

2014:11:14-07:16:53 rz-fw1-1 repctl[3874]: [e] db_connect(2554): error while connecting to database: could not connect to server: Connection refused

2014:11:14-07:16:53 rz-fw1-1 repctl[3874]: [e] master_connection(2458): could not connect to server: Connection refused

2014:11:14-07:16:53 rz-fw1-1 repctl[3874]: [e] trim_history(1756): cannot connect to database

2014:11:14-07:21:53 rz-fw1-1 repctl[3874]: [e] db_connect(2554): error while connecting to database: could not connect to server: Connection refused

2014:11:14-07:21:53 rz-fw1-1 repctl[3874]: [e] master_connection(2458): could not connect to server: Connection refused

2014:11:14-07:21:53 rz-fw1-1 repctl[3874]: [e] trim_history(1756): cannot connect to database

2014:11:14-07:26:53 rz-fw1-1 repctl[3874]: [e] db_connect(2554): error while connecting to database: could not connect to server: Connection refused

2014:11:14-07:26:53 rz-fw1-1 repctl[3874]: [e] master_connection(2458): could not connect to server: Connection refused

2014:11:14-07:26:53 rz-fw1-1 repctl[3874]: [e] trim_history(1756): cannot connect to database

Könnte das eventuell etwas mit dem Rebuild zu tun haben? Werde die HA-Funktion jetzt erst mal nicht ausprobieren [:D]

Aber Danke schon mal für deine Hilfe [;)]

0 GuyFawkes over 10 years ago

Hast du die Option - "Prefered Master" aktiv?

Also da die eh nen Posgres Rebuild gemacht hast, würde ich wie folgt vorgehen.
Logs sichern - Backup(s) sichern - Neuinstallation des "Node2" - Backup auf Node2 einspielen - Logs importieren

Node2 verkabeln, Node1 ausschalten, Node2 anschalten
Optional Node2 umbennen - im Anschluss den alten Master neu installieren und als Slave hinzufügen.

Fertig.

Ich hatte zuletzt auch länger ähnliche Probleme auf zwei Systemen (wiederkehrend nach einem Ticket), Werkseinstellungen haben hier langfristig nicht geholfen, wir nur ne Neuinstallation, da es schnell gehen musste, haben wir diesen Weg gewählt - löst das Problem zwar, aber Ursache ist natürlich nicht gefunden.

Nice greetings
Cancel
Vote Up 0 Vote Down

Cancel
0 knokatorat93 over 10 years ago in reply to GuyFawkes

Vielen Dank für die schnelle Antwort. Werde das jetzt erst mal so ausprobieren und im Anschluss berichten [;)]

Zum Glück ist heute Freitag, da kann ich mir mal eine kürzer Downtime erlauben [:D]

In diesem Sinne, schon mal ein schönes Wochenende [:)]
Cancel
Vote Up 0 Vote Down

Cancel

0 knokatorat93 over 10 years ago in reply to knokatorat93

Ich habe jetzt beide UTMs neu installiert. Aktuell läuft der Sync auf dem Slave von. Dauert für mein Gefühl schon ziemlich lange. Allerdings kann ich nichts verdächtiges im Log sehen:

014:11:14-14:51:36 rz-fw1-2 ha_daemon[9229]: id="38A0" severity="info" sys="System" sub="ha" name="Reading cluster configuration"

2014:11:14-14:51:50 rz-fw1-2 ha_daemon[9229]: id="38A0" severity="info" sys="System" sub="ha" name="Access granted to remote node 1!"

2014:11:14-14:51:50 rz-fw1-2 ha_daemon[9229]: id="38A0" severity="info" sys="System" sub="ha" name="Reading cluster configuration"

2014:11:14-14:51:50 rz-fw1-2 ha_daemon[9229]: id="38A0" severity="info" sys="System" sub="ha" name="Starting use of backup interface 'eth1'"

2014:11:14-14:51:57 rz-fw1-2 ha_daemon[9229]: id="38A0" severity="info" sys="System" sub="ha" name="Node 1 joined with version 9.209008"

2014:11:14-14:51:57 rz-fw1-2 ha_daemon[9229]: id="38C0" severity="info" sys="System" sub="ha" name="Node 1 is alive!"

2014:11:14-14:51:57 rz-fw1-2 ha_daemon[9229]: id="38A0" severity="info" sys="System" sub="ha" name="Node 1 changed state: DEAD -> SYNCING"

2014:11:14-14:51:57 rz-fw1-2 repctl[25619]:  daemonize_check(1864): trying to signal daemon

2014:11:14-14:52:21 rz-fw1-2 ha_daemon[9229]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth1 eth3 eth2 eth4 eth5 " [/CODE]



An der Bandbreite kann es nicht scheitern die sehen sich mit 1 GBit/s. Da warte mal noch ein bisschen *kaffeehol* [:D]