This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos UTM HA Cluster Slave-Node erkennt Master als dead und hohe CPU Auslastung

Hallo zusammen,

in meiner Firma haben wir ein Aktiv-Passiv-Cluster aus 2 SG230, die je ein "4x 10 GbE SFP+ FleXi Portmodul [SGIZTCHF4]" eingebaut haben. Vor kurzem kamen neue VLANs und neue Switche (25GBit) dazu. Die neuen Switche sind am Portmodul als LAG (lag2) und die alten Switche als LAG (lag0) an die 1GBit Kupfer-Ports der Sophos angeschlossen.

Nun haben wir, seit die neuen VLANs aktiv sind und belastet werden immer wieder temporäre Totalauslastung der CPU (Normalzustand: 20-30%) und damit Instabilität der REDs, VPN und VOIP. Außerdem erkennt der Slave zwischenzeitlich den Master nicht mehr obwohl dieser aktiv ist.

Die HA-Verbindung erfolgt direkt von einer FW auf die andere ohne Switch dazwischen. Intrusion Prevention ist deaktiviert. Beide Firewalls haben 9.712-13 installiert.

Wir haben testweise Qos auf den neuen Interfaces auf 2000Mbit symmetrisch begrenzt, leider behebt das weder das CPU- noch das HA Problem. Auch ein Neustart beider Firewalls nacheinander hat auch keine Besserung gebracht.

2022:11:08-08:33:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:35:22 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  708 22.630" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-08:35:32 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  491 32.491" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-08:37:14 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:38:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:42:14 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:43:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:44:21 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  492 21.583" name="Timer event count = 4 (we are too late)"
2022:11:08-08:44:45 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  709 45.341" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-08:45:00 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  710 00.560" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-08:45:08 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  493 08.638" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-08:47:14 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:48:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:49:10 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  711 10.736" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-08:49:20 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  494 20.567" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-08:51:00 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  712 00.576" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-08:51:10 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  495 10.438" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-08:52:14 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:52:19 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  714 19.528" name="Executing (nowait) /etc/init.d/ha_mode check"
2022:11:08-08:52:19 gw01-2 ha_mode[13228]: calling check
2022:11:08-08:52:19 gw01-2 ha_mode[13228]: check: waiting for last ha_mode done
2022:11:08-08:52:19 gw01-2 ha_mode[13228]: check_ha() role=SLAVE, status=ACTIVE
2022:11:08-08:52:19 gw01-2 ha_mode[13228]: check done (started at 08:52:19)
2022:11:08-08:53:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:57:14 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-08:58:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:02:15 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:03:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:07:15 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:08:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:12:15 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:13:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:17:15 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:18:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:19:10 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  715 10.017" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:19:19 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  496 19.850" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:22:15 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:23:05 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  497 05.399" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.1 slave_ip 198.19.250.2"
2022:11:08-09:23:05 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  498 05.562" name="Executing (nowait) /etc/init.d/ha_mode check"
2022:11:08-09:23:05 gw01-1 ha_mode[25363]: calling check
2022:11:08-09:23:05 gw01-1 ha_mode[25363]: check: waiting for last ha_mode done
2022:11:08-09:23:05 gw01-1 ha_mode[25363]: check_ha() role=MASTER, status=ACTIVE
2022:11:08-09:23:05 gw01-1 ha_mode[25363]: check done (started at 09:23:05)
2022:11:08-09:23:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:26:21 gw01-2 ha_daemon[4298]: id="38C1" severity="error" sys="System" sub="ha" seq="S:  716 21.410" name="Node 1 is dead, received no heart beats"
2022:11:08-09:26:21 gw01-2 ha_daemon[4298]: id="38B5" severity="info" sys="System" sub="ha" seq="S:  717 21.410" name="Master is dead, taking over"
2022:11:08-09:26:21 gw01-2 ha_daemon[4298]: id="38B0" severity="info" sys="System" sub="ha" seq="S:  718 21.410" name="Switching to Master mode"
2022:11:08-09:26:21 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  719 21.410" name="start/reset initial synchronization timer = 0"
2022:11:08-09:26:21 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  720 21.412" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''"
2022:11:08-09:26:18 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  499 18.421" name="Timer event count = 5 (we are too late)"
2022:11:08-09:26:21 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  500 21.887" name="Another master around!"
2022:11:08-09:26:21 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  501 21.920" name="Node 2 changed role: SLAVE -> MASTER"
2022:11:08-09:26:21 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  502 21.963" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.1 slave_ip ''"
2022:11:08-09:26:32 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  503 32.652" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2022:11:08-09:26:32 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  504 32.671" name="Enforce MASTER, Resending gratuitous arp"
2022:11:08-09:26:32 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  505 32.672" name="Executing (nowait) /etc/init.d/ha_mode enforce_master"
2022:11:08-09:26:32 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  506 32.672" name="Timer event count = 11 (we are too late)"
2022:11:08-09:26:32 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  507 32.672" name="Enforce MASTER, Resending gratuitous arp"
2022:11:08-09:26:32 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  508 32.672" name="Executing (nowait) /etc/init.d/ha_mode enforce_master"
2022:11:08-09:26:32 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  509 32.672" name="Node 2 changed state: ACTIVE(0) -> SYNCING(2)"
2022:11:08-09:26:32 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  510 32.672" name="Node 2 changed role: MASTER -> SLAVE"
2022:11:08-09:26:32 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  511 32.672" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.1 slave_ip 198.19.250.2"
2022:11:08-09:26:34 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  512 34.533" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2022:11:08-09:26:34 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  513 34.534" name="Timer event count = 2 (we are too late)"
2022:11:08-09:26:34 gw01-1 ha_mode[28110]: calling enforce_master
2022:11:08-09:26:34 gw01-1 ha_mode[28105]: calling topology_changed
2022:11:08-09:26:34 gw01-1 ha_mode[28105]: topology_changed: waiting for last ha_mode done
2022:11:08-09:26:34 gw01-1 ha_mode[28107]: calling enforce_master
2022:11:08-09:26:35 gw01-1 ha_mode[28145]: calling topology_changed
2022:11:08-09:26:39 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  514 39.400" name="Set syncing.files for node 2"
2022:11:08-09:26:45 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  515 45.289" name="Reading cluster configuration"
2022:11:08-09:26:47 gw01-1 ha_mode[28105]: repctl[28393]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2022:11:08-09:26:48 gw01-1 ha_mode[28105]: topology_changed done (started at 09:26:34)
2022:11:08-09:26:47 gw01-1 repctl[4337]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:26:49 gw01-1 ha_mode[28107]: enforce_master: waiting for last ha_mode done
2022:11:08-09:26:49 gw01-1 ha_mode[28107]: enforce_master
2022:11:08-09:26:52 gw01-1 ha_mode[28107]: /var/mdw/scripts/confd-sync: /usr/local/bin/confd-sync stopped
2022:11:08-09:26:53 gw01-1 ha_mode[28107]: /var/mdw/scripts/confd-sync: /usr/local/bin/confd-sync started
2022:11:08-09:26:53 gw01-1 ha_mode[28107]: enforce_master done (started at 09:26:34)
2022:11:08-09:26:54 gw01-1 ha_mode[28110]: enforce_master: waiting for last ha_mode done
2022:11:08-09:26:54 gw01-1 ha_mode[28110]: enforce_master
2022:11:08-09:26:47 gw01-1 repctl[28393]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2022:11:08-09:26:47 gw01-1 repctl[28393]: [i] daemonize_check(1497): trying to signal daemon and exit
2022:11:08-09:26:54 gw01-1 ha_mode[28110]: /var/mdw/scripts/confd-sync: /usr/local/bin/confd-sync stopped
2022:11:08-09:26:55 gw01-1 ha_mode[28110]: /var/mdw/scripts/confd-sync: /usr/local/bin/confd-sync started
2022:11:08-09:26:55 gw01-1 ha_mode[28110]: enforce_master done (started at 09:26:34)
2022:11:08-09:26:55 gw01-1 ha_mode[28145]: topology_changed: waiting for last ha_mode done
2022:11:08-09:26:58 gw01-1 ha_mode[28145]: repctl[28730]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2022:11:08-09:26:58 gw01-1 repctl[28730]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2022:11:08-09:26:58 gw01-1 repctl[28730]: [i] daemonize_check(1497): trying to signal daemon and exit
2022:11:08-09:26:58 gw01-1 repctl[4337]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:26:58 gw01-1 ha_mode[28145]: topology_changed done (started at 09:26:35)
2022:11:08-09:27:06 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  516 06.764" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:27:08 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  517 08.151" name="Reading cluster configuration"
2022:11:08-09:27:16 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  518 16.139" name="Clear syncing.files for node 2"
2022:11:08-09:27:20 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  744 20.477" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:27:23 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  519 23.753" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:27:27 gw01-2 conntrack-tools[4684]: flushing kernel conntrack table (scheduled)
2022:11:08-09:28:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:31:22 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  745 22.410" name="Initial synchronization finished!"
2022:11:08-09:31:22 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  746 22.410" name="state change SYNCING(2) -> ACTIVE(0)"
2022:11:08-09:31:22 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  520 22.555" name="Node 2 changed state: SYNCING(2) -> ACTIVE(0)"
2022:11:08-09:32:00 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:33:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:37:01 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:38:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:42:04 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:42:37 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  747 37.235" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:42:47 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  521 47.164" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:42:47 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  748 47.831" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:43:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:47:05 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:48:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:50:03 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  749 03.909" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:50:14 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  522 14.030" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-09:52:05 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:53:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:57:05 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:58:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-09:59:20 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  523 20.724" name="Timer event count = 2 (we are too late)"
2022:11:08-09:59:25 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  524 25.260" name="Timer event count = 4 (we are too late)"
2022:11:08-10:00:01 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  525 01.336" name="Timer event count = 4 (we are too late)"
2022:11:08-10:00:22 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  750 22.887" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:00:34 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  526 34.969" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:00:35 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  751 35.359" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:00:51 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  527 51.034" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:01:09 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  752 09.784" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:01:19 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  528 19.829" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:02:05 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:03:00 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  753 00.276" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:03:10 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  529 10.338" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:03:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:05:02 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  754 02.639" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:05:15 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  755 15.351" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:05:16 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  530 16.760" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:07:05 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:08:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:12:06 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:13:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:16:21 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  756 21.410" name="Executing (wait) /usr/local/bin/confd-setha mode slave"
2022:11:08-10:16:21 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  757 21.532" name="Executing (nowait) /etc/init.d/ha_mode check"
2022:11:08-10:16:21 gw01-2 ha_mode[29086]: calling check
2022:11:08-10:16:21 gw01-2 ha_mode[29086]: check: waiting for last ha_mode done
2022:11:08-10:16:21 gw01-2 ha_mode[29086]: check_ha() role=SLAVE, status=ACTIVE
2022:11:08-10:16:21 gw01-2 ha_mode[29086]: check done (started at 10:16:21)
2022:11:08-10:16:40 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  531 40.399" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.1 slave_ip 198.19.250.2"
2022:11:08-10:16:40 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  532 40.559" name="Executing (nowait) /etc/init.d/ha_mode check"
2022:11:08-10:16:40 gw01-1 ha_mode[29337]: calling check
2022:11:08-10:16:40 gw01-1 ha_mode[29337]: check: waiting for last ha_mode done
2022:11:08-10:16:40 gw01-1 ha_mode[29337]: check_ha() role=MASTER, status=ACTIVE
2022:11:08-10:16:40 gw01-1 ha_mode[29337]: check done (started at 10:16:40)
2022:11:08-10:17:06 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:18:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:19:33 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  533 33.380" name="Timer event count = 2 (we are too late)"
2022:11:08-10:19:48 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  534 48.251" name="Timer event count = 2 (we are too late)"
2022:11:08-10:19:56 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  535 56.949" name="Timer event count = 2 (we are too late)"
2022:11:08-10:20:25 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  758 25.112" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:20:31 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  759 31.110" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:20:37 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  536 37.514" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:22:06 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:22:06 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  760 06.887" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:22:16 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  537 16.759" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:23:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:27:06 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:28:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:32:06 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:33:40 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  761 40.080" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:33:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:33:49 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  538 49.957" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:34:15 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  762 15.521" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:34:25 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  539 25.598" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:37:06 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:38:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:38:50 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  763 50.395" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:39:00 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  540 00.403" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:42:07 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:43:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:44:42 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  541 42.373" name="Timer event count = 3 (we are too late)"
2022:11:08-10:44:49 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  542 49.319" name="Timer event count = 2 (we are too late)"
2022:11:08-10:45:00 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  543 00.095" name="Timer event count = 2 (we are too late)"
2022:11:08-10:45:25 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  764 25.639" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:45:30 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  765 30.947" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:45:42 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  544 42.462" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:47:07 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:48:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:52:07 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:53:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:54:40 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  766 40.087" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:54:47 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  767 47.858" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:54:50 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  545 50.043" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-10:57:07 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-10:58:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-11:02:02 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  768 02.726" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-11:02:07 gw01-1 repctl[4337]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-11:02:12 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  546 12.697" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-11:03:48 gw01-2 repctl[4345]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:11:08-11:04:43 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  547 43.791" name="Timer event count = 4 (we are too late)"
2022:11:08-11:04:48 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  548 48.552" name="Timer event count = 3 (we are too late)"
2022:11:08-11:05:06 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  549 06.952" name="Timer event count = 5 (we are too late)"
2022:11:08-11:05:38 gw01-1 ha_daemon[4290]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  550 38.399" name="Current load average 70.90 is too high (>50), warning 1 of 29 before trying to switch to slave role"
2022:11:08-11:05:39 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  769 39.105" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-11:05:52 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  770 52.918" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-11:05:53 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  551 53.167" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-11:06:12 gw01-2 ha_daemon[4298]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  771 12.534" name="Monitoring interfaces for link beat: lag2 lag0"
2022:11:08-11:06:22 gw01-1 ha_daemon[4290]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  552 22.198" name="Monitoring interfaces for link beat: lag2 lag0"

Was könnte das Problem sein, dass der Slave den Master nicht mehr durchgehend erkennt? Und woran könnte es liegen das die CPU Auslastung temporär hoch geht ohne das die Interfaces ausgelastet sind?

Vielen Dank für eure Hilfe.



This thread was automatically locked due to age.
Parents Reply Children