This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Up2date installation on HA killed slave

Hi

After investigating on how to do the Up2Date-installation in an active/passive HA environment I finally felt safe doing it.

15 minutes after clicking "Upgrade to latest version now" I ended up with a DEAD slave. Now it has passed 30 minutes and still the SLAVE is dead.

The log says:

2010:02:17-22:14:35 (none) ha_daemon[3539]: id="38A2" severity="error" sys="System" sub="ha" name="Node 2 died during up2date process!"
2010:02:17-22:14:35 (none) ha_daemon[3539]: id="38C1" severity="info" sys="System" sub="ha" name="Node 2 is dead, received no heart beats!"
2010:02:17-22:14:37 (none) slon_control[3668]: Killing slon reporting [21816]
2010:02:17-22:14:37 (none) slon_control[3668]: Killing slon pop3 [21817]
2010:02:17-22:14:45 (none) ha_daemon[3539]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth5!"
2010:02:17-22:14:47 (none) ha_daemon[3539]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth5 again!"
2010:02:17-22:14:56 (none) ha_daemon[3539]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth5!"
2010:02:17-22:14:57 (none) slon_control[3668]: Slon reporting exited with value 0!
2010:02:17-22:14:57 (none) slon_control[3668]: Slon pop3 exited with value 0!
2010:02:17-22:14:59 (none) ha_daemon[3539]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth5 again!"

What am I doing wrong?

This thread was automatically locked due to age.

Parents

0 BAlfson over 13 years ago

That looks the log during an Up2Date. If you're convinced that it's not working, then you can disconnect the slave and factory reset it before reconnecting it:

Login as root
Type cc [enter]
RAW [enter]
system_factory_reset [enter]

Angelo commented recently on how a normal Up2Date proceeds in HA.

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 lindin over 13 years ago in reply to BAlfson

Thanks for your answer.

I had to reinstall the cluster the last time it happened, it is doable this time too. But I really would like to know why the slave ends up dead. It has gone 4 hours since I started the Up2date and the slave node is still dead. I am convinced it will not heal.

I am sure factory reset / reinstallation will help me get back on track but it does not solve the main problem - that is; why does this happen?

I would like to fix this without doing a factory reset, something is wrong and I guess it is interesting also for Astaro to solve this.

I will have to get the HA back on track this week, preferably tomorrow. I have registered a support case and hopefully I will get some clarity regarding this problem that keeps occurring when we update the FW.

That looks the log during an Up2Date. If you're convinced that it's not working, then you can disconnect the slave and factory reset it before reconnecting it:

Login as root
Type cc [enter]
RAW [enter]
system_factory_reset [enter]

Angelo commented recently on how a normal Up2Date proceeds in HA.

Cheers - Bob
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 lindin over 13 years ago in reply to BAlfson

Thanks for your answer.

I had to reinstall the cluster the last time it happened, it is doable this time too. But I really would like to know why the slave ends up dead. It has gone 4 hours since I started the Up2date and the slave node is still dead. I am convinced it will not heal.

I am sure factory reset / reinstallation will help me get back on track but it does not solve the main problem - that is; why does this happen?

I would like to fix this without doing a factory reset, something is wrong and I guess it is interesting also for Astaro to solve this.

I will have to get the HA back on track this week, preferably tomorrow. I have registered a support case and hopefully I will get some clarity regarding this problem that keeps occurring when we update the FW.

That looks the log during an Up2Date. If you're convinced that it's not working, then you can disconnect the slave and factory reset it before reconnecting it:

Login as root
Type cc [enter]
RAW [enter]
system_factory_reset [enter]

Angelo commented recently on how a normal Up2Date proceeds in HA.

Cheers - Bob
Cancel
Vote Up 0 Vote Down

Cancel

Children

No Data