HA - Passive node unstable

Question

Hey,

I've got two UTM v9.307-6 boxes which exist as VMs on two separate Hyper-V servers.

I had been running one in a non-HA configuration for a while, so I'm happy that's stable.

A couple of nights ago I built a new UTM virtual machine and used the HA automatic configuration set it up in a Hot/Standby configuration.

This went okay, and it's synced its config to the new server and placed the slave node in 'READY'. It's intended that the original VM remain the Primary Master, so I've confirmed that, and also set a Backup Interface.

My problem is that according to the master, the Slave node keeps changing its status to DEAD every half hour or so. After about a minute of it disappearing, the slave VM returns and the status moves to SYNCING and then back to READY at which point it seems to be working again happily.

What's not fine is that it's doing it every 20 minutes.

When the slave is in READY mode and I intentionally kill connectivity to the Master node, the slave node successfully takes over as expected; and when the preferred master comes back online it successfully transitions control back to the original node.

When the slave node fails there's no impact to the network as such, the master node happily continues serving requests, however, clearly that wouldn't be the case if the master node failed at the same moment the slave node was marked as DEAD.

The Hyper-V server of the slave isn't as powerful as the one which is primary master, however, they're both sitting on average at

2015:02:13-00:41:10 home-1 ha_daemon[10438]: id="38C1" severity="error" sys="System" sub="ha" seq="M:  959 10.875" name="Node 2 is dead, received no heart beats"

2015:02:13-00:41:11 home-1 ha_mode[782]: daemonized...

2015:02:13-00:41:11 home-1 repctl[797]:  daemonize_check(1958): trying to signal daemon

2015:02:13-00:41:13 home-1 ha_daemon[10438]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  960 13.207" name="Reading cluster configuration"

2015:02:13-00:41:28 home-1 ha_daemon[10438]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  961 28.698" name="Monitoring interfaces for link beat: eth1 eth0"

2015:02:13-00:41:28 home-1 ha_daemon[10438]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:  962 28.699" name="Netlink: Found link beat on eth2 again!"

2015:02:13-00:41:34 home-1 ha_daemon[10438]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  963 34.205" name="Node 2 changed version! 0.000000 -> 9.307006"

2015:02:13-00:41:34 home-1 ha_daemon[10438]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  964 34.205" name="Lost heartbeat message from node 2! Expected 4474 but got 4499"

2015:02:13-00:41:34 home-1 ha_daemon[10438]: id="38C0" severity="info" sys="System" sub="ha" seq="M:  965 34.205" name="Node 2 is alive"

2015:02:13-00:41:34 home-1 ha_daemon[10438]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  966 34.205" name="Node 2 changed state: DEAD(2048) -> ACTIVE(0)"

2015:02:13-00:41:34 home-1 ha_daemon[10438]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  967 34.206" name="Node 2 changed role: DEAD -> SLAVE"

2015:02:13-00:41:34 home-2 ha_daemon[3652]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  242 34.719" name="Node 1 changed version! 0.000000 -> 9.307006"

2015:02:13-00:41:34 home-2 ha_daemon[3652]: id="38A1" severity="warn" sys="System" sub="ha" seq="S:  243 34.720" name="Lost heartbeat message from node 1! Expected 82903 but got 82911"

2015:02:13-00:41:34 home-2 ha_daemon[3652]: id="38C0" severity="info" sys="System" sub="ha" seq="S:  244 34.720" name="Node 1 is alive"

2015:02:13-00:41:34 home-2 ha_daemon[3652]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  245 34.720" name="Node 1 changed state: DEAD(2048) -> ACTIVE(0)"

2015:02:13-00:41:34 home-2 ha_daemon[3652]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  246 34.720" name="Node 1 changed role: DEAD -> MASTER"

2015:02:13-00:41:35 home-2 ha_mode[18691]: daemonized...

2015:02:13-00:41:35 home-2 repctl[18712]:  daemonize_check(1958): trying to signal daemon

2015:02:13-00:41:35 home-2 ha_mode[18702]: daemonized...

2015:02:13-00:41:35 home-2 repctl[18720]:  daemonize_check(1958): trying to signal daemon

2015:02:13-00:41:36 home-1 ha_mode[986]: daemonized...

2015:02:13-00:41:36 home-1 repctl[1002]:  daemonize_check(1958): trying to signal daemon

2015:02:13-00:41:37 home-1 ha_daemon[10438]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  968 37.867" name="Reading cluster configuration"

2015:02:13-00:41:43 home-2 ha_daemon[3652]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  247 43.469" name="Monitoring interfaces for link beat: eth1 eth0"

2015:02:13-00:41:53 home-1 ha_daemon[10438]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  969 53.422" name="Monitoring interfaces for link beat: eth1 eth0"[/code]

This thread was automatically locked due to age.