Sophos UTM 9.5 X2
ESXi 6.5
I have two UTM's running on different servers connected with a direct cable for HA. Last year I had to turn off one of the servers because I get logs full of stuff like below.
So I thought maybe the 2nd UTM was borked. So this morning I wiped the 2nd UTM, reinstalled from the ISO, turned on HA and this started again. Also, the traffic on the HA interface is always going 500 to 1000Mbit which I don't think is normal. I had another set of UTM's in the company with the same setup and they never have this issue. With HA is it better for a direct wire or using a switch? Is there anything I need to edit in ESXi to make this work better? The errors started the moment I turned on HA.
Below is a graph of the traffic from ETH2 the HA interface. This is in Mbit/sec
2018:05:06-03:52:46 gateway-2 ha_daemon[7734]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 78 46.243" name="Activating sync process for database on node 1" 2018:05:06-03:51:47 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 53 47.064" name="Lost heartbeat message from node 2! Expected 550 but got 551" 2018:05:06-03:53:01 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 79 01.206" name="Lost heartbeat message from node 1! Expected 747 but got 748" 2018:05:06-03:53:18 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 80 18.371" name="Lost heartbeat message from node 1! Expected 764 but got 765" 2018:05:06-03:52:11 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 54 11.242" name="Lost heartbeat message from node 2! Expected 574 but got 575" 2018:05:06-03:53:22 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 81 22.273" name="Lost heartbeat message from node 1! Expected 768 but got 769" 2018:05:06-03:53:25 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 82 25.283" name="Lost heartbeat message from node 1! Expected 771 but got 772" 2018:05:06-03:52:25 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 55 25.287" name="Lost heartbeat message from node 2! Expected 588 but got 589" 2018:05:06-03:52:27 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 56 27.289" name="Lost heartbeat message from node 2! Expected 590 but got 591" 2018:05:06-03:53:37 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 83 37.397" name="Lost heartbeat message from node 1! Expected 783 but got 784" 2018:05:06-03:52:34 gateway-1 repctl[21098]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2018:05:06-03:53:18 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 57 18.550" name="Lost heartbeat message from node 2! Expected 641 but got 642" 2018:05:06-03:54:26 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 84 26.304" name="Lost heartbeat message from node 1! Expected 832 but got 833" 2018:05:06-03:53:22 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 58 22.554" name="Lost heartbeat message from node 2! Expected 645 but got 646" 2018:05:06-03:54:30 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 85 30.297" name="Lost heartbeat message from node 1! Expected 836 but got 837" 2018:05:06-03:54:35 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 86 35.299" name="Lost heartbeat message from node 1! Expected 841 but got 842" 2018:05:06-03:53:35 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 59 35.592" name="Lost heartbeat message from node 2! Expected 658 but got 659" 2018:05:06-03:54:44 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 87 44.542" name="Lost heartbeat message from node 1! Expected 849 but got 851" 2018:05:06-03:53:40 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 88 40.491" name="Lost heartbeat message from node 1! Expected 854 but got 855" 2018:05:06-03:53:47 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 60 47.745" name="Lost heartbeat message from node 2! Expected 668 but got 671" 2018:05:06-03:53:49 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 61 49.748" name="Lost heartbeat message from node 2! Expected 672 but got 673" 2018:05:06-03:53:50 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 89 50.637" name="Lost heartbeat message from node 1! Expected 864 but got 865" 2018:05:06-03:53:53 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 90 53.462" name="Lost heartbeat message from node 1! Expected 867 but got 868" 2018:05:06-03:53:54 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 62 54.754" name="Lost heartbeat message from node 2! Expected 677 but got 678" 2018:05:06-03:53:54 gateway-1 ha_daemon[20000]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 63 54.848" name="Reading cluster configuration" 2018:05:06-03:53:59 gateway-2 ha_daemon[7734]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 91 59.195" name="Reading cluster configuration" 2018:05:06-03:53:59 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 92 59.723" name="Lost heartbeat message from node 1! Expected 872 but got 874" 2018:05:06-03:54:00 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 64 00.787" name="Lost heartbeat message from node 2! Expected 683 but got 684" 2018:05:06-03:54:01 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 93 01.507" name="Lost heartbeat message from node 1! Expected 875 but got 876" 2018:05:06-03:54:02 gateway-1 ha_daemon[20000]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 65 02.756" name="Reading cluster configuration" 2018:05:06-03:54:02 gateway-1 ha_daemon[20000]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 66 02.756" name="Starting use of backup interface 'eth1'" 2018:05:06-03:54:04 gateway-2 ha_daemon[7734]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 94 04.904" name="Monitoring interfaces for link beat: eth1" 2018:05:06-03:54:05 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 95 05.473" name="Lost heartbeat message from node 1! Expected 879 but got 880" 2018:05:06-03:54:09 gateway-1 ha_daemon[20000]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 67 09.934" name="Monitoring interfaces for link beat: eth1" 2018:05:06-03:54:15 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 68 15.858" name="Lost heartbeat message from node 2! Expected 698 but got 699" 2018:05:06-03:54:28 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 69 28.901" name="Lost heartbeat message from node 2! Expected 711 but got 712" 2018:05:06-03:54:31 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 96 31.733" name="Lost heartbeat message from node 1! Expected 905 but got 906" 2018:05:06-03:54:39 gateway-2 ha_daemon[7734]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 97 39.199" name="Reading cluster configuration" 2018:05:06-03:54:39 gateway-2 ha_daemon[7734]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 98 39.199" name="Starting use of backup interface 'eth1'" 2018:05:06-03:54:43 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 70 43.939" name="Lost heartbeat message from node 2! Expected 725 but got 727" 2018:05:06-03:54:44 gateway-2 ha_daemon[7734]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 99 44.814" name="Monitoring interfaces for link beat: eth1" 2018:05:06-03:54:46 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 100 46.590" name="Lost heartbeat message from node 1! Expected 920 but got 921" 2018:05:06-03:54:52 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 101 52.516" name="Lost heartbeat message from node 1! Expected 924 but got 927" 2018:05:06-03:54:53 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 102 53.808" name="Received no backup heartbeats at interface 'eth1'" 2018:05:06-03:55:12 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 103 12.554" name="Lost heartbeat message from node 1! Expected 946 but got 947" 2018:05:06-03:55:19 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 104 19.542" name="Lost heartbeat message from node 1! Expected 953 but got 954" 2018:05:06-03:55:22 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 105 22.547" name="Lost heartbeat message from node 1! Expected 956 but got 957" 2018:05:06-03:55:39 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 106 39.574" name="Lost heartbeat message from node 1! Expected 972 but got 974" 2018:05:06-03:55:47 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 107 47.572" name="Lost heartbeat message from node 1! Expected 981 but got 982" 2018:05:06-03:56:13 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 71 13.600" name="Lost heartbeat message from node 2! Expected 815 but got 816" 2018:05:06-03:56:30 gateway-1 ha_daemon[20000]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 72 30.635" name="Lost heartbeat message from node 2! Expected 831 but got 833" 2018:05:06-03:56:33 gateway-2 ha_daemon[7734]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 110 33.646" name="Lost heartbeat message from node 1! Expected 1027 but got 1028"
This thread was automatically locked due to age.