This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UTM Slave node got unlicensed because of left "reserved"?

Hi,

running an a/p cluster of two SG 135. Some time ago we performed an update to 9.702-1 and left one node "reserved". Unfortunately we left him in this state for a couple of days.

Today at midnight, slave went to "DEAD". Today is the end of a license that has been renewed but I guess due to the fact that the slave was reserved there was no replication anymore. And no replication of the renewed license. And the slave got unlicensed and turned to death state.

How to get the slave back to the licensed cluster?

Thanks for any advice.

Cheers

Philipp

This thread was automatically locked due to age.

Parents

0 Alexander Busch over 4 years ago

Hey Philipp,

I would try to reset / remove the dead node and then get the node back in the cluster via auto configuration.
That can be done without any downtime. Until the end of the process I would detach all Ethernet cables except eth3 from the dead node.

Best regards

Alex

-
Cancel
Vote Up 0 Vote Down

Cancel
0 Philipp Thielke1 over 4 years ago in reply to Alexander Busch

Hey Alex,

thanks for your advice. Do you think that this will work even the dead slave node isn‘t on the same firmware release as the current master?

Regards,

Philipp
Cancel
Vote Up 0 Vote Down

Cancel
0 Alexander Busch over 4 years ago in reply to Philipp Thielke1

Yes, the ‘new’ node will do the Up2date while getting part of the cluster. I already considered that.

Best regards

Alex

-
Cancel
Vote Up 0 Vote Down

Cancel
0 Philipp Thielke1 over 4 years ago in reply to Alexander Busch

Perfect! I‘ll try that asap and keep you tuned. Many thanks.
Cancel
Vote Up 0 Vote Down

Cancel
0 Philipp Thielke1 over 4 years ago in reply to Alexander Busch

Hi Alex,

just tried that together with staff of our data center where the appliance is being housed.

Unfortunately this approach does not work. All interfaces unlinked on slave. Except heartbeat if.

Log messages on the master:

2020:04:14-17:19:03 xxx-2 ha_daemon[4280]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 516 03.563" name="Netlink: Found link beat on eth3 again!"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 517 08.376" name="Access granted to remote node 1!"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 518 08.587" name="Node 1 changed version! 0.000000 -> 9.701006"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38C0" severity="info" sys="System" sub="ha" seq="M: 519 08.587" name="Node 1 is alive"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 520 08.587" name="Node 1 changed state: DEAD(2048) -> RESERVED(4096)"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 521 08.587" name="Node 1 changed role: DEAD -> SLAVE"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 522 08.589" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip 198.19.250.1"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 523 08.811" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2020:04:14-17:19:08 xxx-2 ha_mode[2161]: calling topology_changed
2020:04:14-17:19:08 xxx-2 ha_mode[2161]: topology_changed: waiting for last ha_mode done
2020:04:14-17:19:09 xxx-2 ha_mode[2161]: repctl[2181]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2020:04:14-17:19:09 xxx-2 repctl[2181]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2020:04:14-17:19:09 xxx-2 repctl[2181]: [i] daemonize_check(1497): trying to signal daemon and exit
2020:04:14-17:19:09 xxx-2 repctl[7290]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2020:04:14-17:19:09 xxx-2 ha_mode[2161]: topology_changed done (started at 17:19:08)
2020:04:14-17:19:09 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 524 09.593" name="Reading cluster configuration"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 32 22.516" name="No link on interface eth1"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 33 22.516" name="Netlink: Lost link beat on eth0!"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 35 22.516" name="state change RESERVED(4096) -> RESERVED(4097)"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 34 22.516" name="Netlink: Lost link beat on eth0!"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 37 22.517" name="Netlink: Lost link beat on eth1!"
2020:04:14-17:19:22 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 525 22.601" name="Node 1 changed state: RESERVED(4096) -> RESERVED(4097)"
2020:04:14-17:19:25 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 52 25.000" name="HA daemon shutting down (SIGTERM)"
2020:04:14-17:19:25 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 53 25.000" name="Executing (nowait) /etc/init.d/ha_mode shutdown"
2020:04:14-17:19:25 xxx-1 ha_mode[6683]: calling shutdown
2020:04:14-17:19:25 xxx-1 ha_mode[6683]: shutdown: waiting for last ha_mode done
2020:04:14-17:19:25 xxx-1 ha_mode[6683]: /var/mdw/scripts/confd-sync: /usr/local/bin/confd-sync stopped
2020:04:14-17:19:25 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 54 25.013" name="HA daemon exits (SIGTERM)"
2020:04:14-17:19:25 xxx-1 ha_mode[6683]: shutdown done (started at 17:19:25)
2020:04:14-17:19:25 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 526 25.105" name="Monitoring interfaces for link beat: eth1 eth0"
2020:04:14-17:19:27 xxx-2 ha_daemon[4280]: id="38C1" severity="error" sys="System" sub="ha" seq="M: 527 27.856" name="Node 1 is dead, received no heart beats"
2020:04:14-17:19:27 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 528 27.858" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''"
2020:04:14-17:19:28 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 529 28.088" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2020:04:14-17:19:28 xxx-2 ha_mode[2550]: calling topology_changed
2020:04:14-17:19:28 xxx-2 ha_mode[2550]: topology_changed: waiting for last ha_mode done
2020:04:14-17:19:28 xxx-2 ha_mode[2550]: repctl[2567]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2020:04:14-17:19:28 xxx-2 repctl[2567]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2020:04:14-17:19:28 xxx-2 repctl[2567]: [i] daemonize_check(1497): trying to signal daemon and exit
2020:04:14-17:19:28 xxx-2 repctl[7290]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2020:04:14-17:19:28 xxx-2 ha_mode[2550]: topology_changed done (started at 17:19:28)
2020:04:14-17:19:28 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 530 28.818" name="Reading cluster configuration"
2020:04:14-17:19:44 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 531 44.374" name="Monitoring interfaces for link beat: eth1 eth0"
2020:04:14-17:19:49 xxx-2 conntrack-tools[4701]: no dedicated links available!
2020:04:14-17:19:49 xxx-2 ha_daemon[4280]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 532 49.290" name="Netlink: Lost link beat on eth3!"

After that the slave remains switched off. Tried to switch in on again. Same story happened.

For a couple of seconds I could see the node as "Reserved" in the Web interface having the option to click on "update now". Also tried this.

Result was:

name="HA control: cmd = 'upgrade 1'"
name="perform upgrade to 9.702001 on node 1"
name="Node 1 changed state: RESERVED(4096) -> UP2DATE(256)"
name="state change UP2DATE(256) -> UP2DATE(257)"
name="HA daemon of node 1 is restarting, waiting 900 seconds before declaring node as dead"
name="HA daemon shutting down (SIGTERM)"

Slave stayed powered off.

Last try: Removed node from master after 900 secs. (Until then it was still visible as "dead")
But same results as on 2nd try. Slave won't join but shuts down.

I will now let the unit ship to me and manually apply the update to 9.702-1. Hopefully it will then join again.

Cheers,

Philipp
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 Philipp Thielke1 over 4 years ago in reply to Alexander Busch

Hi Alex,

just tried that together with staff of our data center where the appliance is being housed.

Unfortunately this approach does not work. All interfaces unlinked on slave. Except heartbeat if.

Log messages on the master:

2020:04:14-17:19:03 xxx-2 ha_daemon[4280]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 516 03.563" name="Netlink: Found link beat on eth3 again!"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 517 08.376" name="Access granted to remote node 1!"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 518 08.587" name="Node 1 changed version! 0.000000 -> 9.701006"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38C0" severity="info" sys="System" sub="ha" seq="M: 519 08.587" name="Node 1 is alive"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 520 08.587" name="Node 1 changed state: DEAD(2048) -> RESERVED(4096)"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 521 08.587" name="Node 1 changed role: DEAD -> SLAVE"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 522 08.589" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip 198.19.250.1"
2020:04:14-17:19:08 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 523 08.811" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2020:04:14-17:19:08 xxx-2 ha_mode[2161]: calling topology_changed
2020:04:14-17:19:08 xxx-2 ha_mode[2161]: topology_changed: waiting for last ha_mode done
2020:04:14-17:19:09 xxx-2 ha_mode[2161]: repctl[2181]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2020:04:14-17:19:09 xxx-2 repctl[2181]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2020:04:14-17:19:09 xxx-2 repctl[2181]: [i] daemonize_check(1497): trying to signal daemon and exit
2020:04:14-17:19:09 xxx-2 repctl[7290]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2020:04:14-17:19:09 xxx-2 ha_mode[2161]: topology_changed done (started at 17:19:08)
2020:04:14-17:19:09 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 524 09.593" name="Reading cluster configuration"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 32 22.516" name="No link on interface eth1"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 33 22.516" name="Netlink: Lost link beat on eth0!"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 35 22.516" name="state change RESERVED(4096) -> RESERVED(4097)"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 34 22.516" name="Netlink: Lost link beat on eth0!"
2020:04:14-17:19:22 xxx-1 ha_daemon[4234]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 37 22.517" name="Netlink: Lost link beat on eth1!"
2020:04:14-17:19:22 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 525 22.601" name="Node 1 changed state: RESERVED(4096) -> RESERVED(4097)"
2020:04:14-17:19:25 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 52 25.000" name="HA daemon shutting down (SIGTERM)"
2020:04:14-17:19:25 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 53 25.000" name="Executing (nowait) /etc/init.d/ha_mode shutdown"
2020:04:14-17:19:25 xxx-1 ha_mode[6683]: calling shutdown
2020:04:14-17:19:25 xxx-1 ha_mode[6683]: shutdown: waiting for last ha_mode done
2020:04:14-17:19:25 xxx-1 ha_mode[6683]: /var/mdw/scripts/confd-sync: /usr/local/bin/confd-sync stopped
2020:04:14-17:19:25 xxx-1 ha_daemon[4234]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 54 25.013" name="HA daemon exits (SIGTERM)"
2020:04:14-17:19:25 xxx-1 ha_mode[6683]: shutdown done (started at 17:19:25)
2020:04:14-17:19:25 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 526 25.105" name="Monitoring interfaces for link beat: eth1 eth0"
2020:04:14-17:19:27 xxx-2 ha_daemon[4280]: id="38C1" severity="error" sys="System" sub="ha" seq="M: 527 27.856" name="Node 1 is dead, received no heart beats"
2020:04:14-17:19:27 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 528 27.858" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''"
2020:04:14-17:19:28 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 529 28.088" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2020:04:14-17:19:28 xxx-2 ha_mode[2550]: calling topology_changed
2020:04:14-17:19:28 xxx-2 ha_mode[2550]: topology_changed: waiting for last ha_mode done
2020:04:14-17:19:28 xxx-2 ha_mode[2550]: repctl[2567]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2020:04:14-17:19:28 xxx-2 repctl[2567]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2020:04:14-17:19:28 xxx-2 repctl[2567]: [i] daemonize_check(1497): trying to signal daemon and exit
2020:04:14-17:19:28 xxx-2 repctl[7290]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2020:04:14-17:19:28 xxx-2 ha_mode[2550]: topology_changed done (started at 17:19:28)
2020:04:14-17:19:28 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 530 28.818" name="Reading cluster configuration"
2020:04:14-17:19:44 xxx-2 ha_daemon[4280]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 531 44.374" name="Monitoring interfaces for link beat: eth1 eth0"
2020:04:14-17:19:49 xxx-2 conntrack-tools[4701]: no dedicated links available!
2020:04:14-17:19:49 xxx-2 ha_daemon[4280]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 532 49.290" name="Netlink: Lost link beat on eth3!"

After that the slave remains switched off. Tried to switch in on again. Same story happened.

For a couple of seconds I could see the node as "Reserved" in the Web interface having the option to click on "update now". Also tried this.

Result was:

name="HA control: cmd = 'upgrade 1'"
name="perform upgrade to 9.702001 on node 1"
name="Node 1 changed state: RESERVED(4096) -> UP2DATE(256)"
name="state change UP2DATE(256) -> UP2DATE(257)"
name="HA daemon of node 1 is restarting, waiting 900 seconds before declaring node as dead"
name="HA daemon shutting down (SIGTERM)"

Slave stayed powered off.

Last try: Removed node from master after 900 secs. (Until then it was still visible as "dead")
But same results as on 2nd try. Slave won't join but shuts down.

I will now let the unit ship to me and manually apply the update to 9.702-1. Hopefully it will then join again.

Cheers,

Philipp
Cancel
Vote Up 0 Vote Down

Cancel

Children

0 Alexander Busch over 4 years ago in reply to Philipp Thielke1

Hi Phillip,

sorry to hear that. Unfortunately the small units don’t have a display. So it’s a little difficult to see the status.
Keep us informed what you find.

Best regards

Alex

-
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 4 years ago in reply to Philipp Thielke1

Hallo Philipp,

Here's my standard instruction set for doing what you're doing:

1. If needed, do a quick, temporary install so that the new device can download Up2Dates.
2. Apply the Up2Dates to the same version as the current unit, do a factory reset and shutdown.
3. On the current UTM in use, on the 'Configuration' tab of 'High Availability':
   a. Disable and then enable Hot-Standby
   b. Select eth3 as the Sync NIC
   c. Configure it as Node_1
   d. Enter an encryption key (I've never found a need to remember it)
   e. Select 'Enable automatic configuration of new devices'
   f. I prefer to use 'Preferred Master: None' and 'Backup interface: Internal'
4. Cable eth3 to eth3 on the new device.
5. Cable all of the other NICs exactly as they are on the original UTM.
6. Power up the new device and wait for the good news. [;)]

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 Philipp Thielke1 over 4 years ago in reply to BAlfson

Hi Bob

Hi Alex,

it worked even simpler. The slave node was still booting with its old config. I was able to login, install the extended license and perform the update. Afterwards I was able to rejoin it to the cluster with the old settings on the master.

Thanks.

Cheers

Philipp
Cancel
Vote Up 0 Vote Down

Cancel