This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UTM Slave node got unlicensed because of left "reserved"?

Hi,

running an a/p cluster of two SG 135. Some time ago we performed an update to 9.702-1 and left one node "reserved". Unfortunately we left him in this state for a couple of days.

Today at midnight, slave went to "DEAD". Today is the end of a license that has been renewed but I guess due to the fact that the slave was reserved there was no replication anymore. And no replication of the renewed license. And the slave  got unlicensed and turned to death state.

How to get the slave back to the licensed cluster?

 

Thanks for any advice.

Cheers

Philipp



This thread was automatically locked due to age.
Parents
  • Hey Philipp,

    I would try to reset / remove the dead node and then get the node back in the cluster via auto configuration.
    That can be done without any downtime. Until the end of the process I would detach all Ethernet cables except eth3 from the dead node.

    Best regards 

    Alex

    -

  • Hey Alex,

    thanks for your advice. Do you think that this will work even the dead slave node isn‘t on the same firmware release as the current master?

    Regards,

    Philipp

  • Yes, the ‘new’ node will do the Up2date while getting part of the cluster. I already considered that.

    Best regards 

    Alex 

    -

  • Perfect! I‘ll try that asap and keep you tuned. Many thanks.

  • Hi Alex,

    just tried that together with staff of our data center where the appliance is being housed.

    Unfortunately this approach does not work. All interfaces unlinked on slave. Except heartbeat if.

    Log messages on the master:

    2020:04:14-17:19:03 xxx-2 ha_daemon[4280]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 516 03.563" name="Netlink: Found link beat on eth3 again!"
    2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 517 08.376" name="Access granted to remote node 1!"
    2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 518 08.587" name="Node 1 changed version! 0.000000 -> 9.701006"
    2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38C0" severity="info" sys="System" sub="ha" seq="M: 519 08.587" name="Node 1 is alive"
    2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 520 08.587" name="Node 1 changed state: DEAD(2048) -> RESERVED(4096)"
    2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 521 08.587" name="Node 1 changed role: DEAD -> SLAVE"
    2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 522 08.589" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip 198.19.250.1"
    2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 523 08.811" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
    2020:04:14-17:19:08 xxx-2 ha_mode[2161]: calling topology_changed
    2020:04:14-17:19:08 xxx-2 ha_mode[2161]: topology_changed: waiting for last ha_mode done
    2020:04:14-17:19:09 xxx-2 ha_mode[2161]: repctl[2181]: [i] daemonize_check(1480): daemonized, see syslog for further messages
    2020:04:14-17:19:09 xxx-2 repctl[2181]: [i] daemonize_check(1480): daemonized, see syslog for further messages
    2020:04:14-17:19:09 xxx-2 repctl[2181]: [i] daemonize_check(1497): trying to signal daemon and exit
    2020:04:14-17:19:09 xxx-2 repctl[7290]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
    2020:04:14-17:19:09 xxx-2 ha_mode[2161]: topology_changed done (started at 17:19:08)
    2020:04:14-17:19:09 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 524 09.593" name="Reading cluster configuration"
    2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 32 22.516" name="No link on interface eth1"
    2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 33 22.516" name="Netlink: Lost link beat on eth0!"
    2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 35 22.516" name="state change RESERVED(4096) -> RESERVED(4097)"
    2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 34 22.516" name="Netlink: Lost link beat on eth0!"
    2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 37 22.517" name="Netlink: Lost link beat on eth1!"
    2020:04:14-17:19:22 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 525 22.601" name="Node 1 changed state: RESERVED(4096) -> RESERVED(4097)"
    2020:04:14-17:19:25 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 52 25.000" name="HA daemon shutting down (SIGTERM)"
    2020:04:14-17:19:25 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 53 25.000" name="Executing (nowait) /etc/init.d/ha_mode shutdown"
    2020:04:14-17:19:25 xxx-1 ha_mode[6683]: calling shutdown
    2020:04:14-17:19:25 xxx-1 ha_mode[6683]: shutdown: waiting for last ha_mode done
    2020:04:14-17:19:25 xxx-1 ha_mode[6683]: /var/mdw/scripts/confd-sync: /usr/local/bin/confd-sync stopped
    2020:04:14-17:19:25 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 54 25.013" name="HA daemon exits (SIGTERM)"
    2020:04:14-17:19:25 xxx-1 ha_mode[6683]: shutdown done (started at 17:19:25)
    2020:04:14-17:19:25 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 526 25.105" name="Monitoring interfaces for link beat: eth1 eth0"
    2020:04:14-17:19:27 xxx-2 ha_daemon[4280]: id="38C1" severity="error" sys="System" sub="ha" seq="M: 527 27.856" name="Node 1 is dead, received no heart beats"
    2020:04:14-17:19:27 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 528 27.858" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''"
    2020:04:14-17:19:28 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 529 28.088" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
    2020:04:14-17:19:28 xxx-2 ha_mode[2550]: calling topology_changed
    2020:04:14-17:19:28 xxx-2 ha_mode[2550]: topology_changed: waiting for last ha_mode done
    2020:04:14-17:19:28 xxx-2 ha_mode[2550]: repctl[2567]: [i] daemonize_check(1480): daemonized, see syslog for further messages
    2020:04:14-17:19:28 xxx-2 repctl[2567]: [i] daemonize_check(1480): daemonized, see syslog for further messages
    2020:04:14-17:19:28 xxx-2 repctl[2567]: [i] daemonize_check(1497): trying to signal daemon and exit
    2020:04:14-17:19:28 xxx-2 repctl[7290]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
    2020:04:14-17:19:28 xxx-2 ha_mode[2550]: topology_changed done (started at 17:19:28)
    2020:04:14-17:19:28 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 530 28.818" name="Reading cluster configuration"
    2020:04:14-17:19:44 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 531 44.374" name="Monitoring interfaces for link beat: eth1 eth0"
    2020:04:14-17:19:49 xxx-2 conntrack-tools[4701]: no dedicated links available!
    2020:04:14-17:19:49 xxx-2 ha_daemon[4280]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 532 49.290" name="Netlink: Lost link beat on eth3!"

    After that the slave remains switched off. Tried to switch in on again. Same story happened.

    For a couple of seconds I could see the node as "Reserved" in the Web interface having the option to click on "update now". Also tried this.

    Result was:

    name="HA control: cmd = 'upgrade 1'"
    name="perform upgrade to 9.702001 on node 1"
    name="Node 1 changed state: RESERVED(4096) -> UP2DATE(256)"
    name="state change UP2DATE(256) -> UP2DATE(257)"
    name="HA daemon of node 1 is restarting, waiting 900 seconds before declaring node as dead"
    name="HA daemon shutting down (SIGTERM)"

    Slave stayed powered off.

    Last try: Removed node from master after 900 secs. (Until then it was still visible as "dead")
    But same results as on 2nd try. Slave won't join but shuts down.

    I will now let the unit ship to me and manually apply the update to 9.702-1. Hopefully it will then join again.


    Cheers,

    Philipp

  • Hi Phillip,

    sorry to hear that. Unfortunately the small units don’t have a display. So it’s a little difficult to see the status.
    Keep us informed what you find.

    Best regards 

    Alex 

    -

Reply Children
No Data