This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HA-Node Replacement Procedure

Hello,

we have two Sophos UTM SG450 Nodes running in a Hot Standby HA-Cluster. (active-passive).

One of the nodes give us the message, that it have a degraded raid:

[CRIT-060] Raid degraded: harddisk replacement needed

A degradation of the harddisk raid was detected

Now we have to replace the node with the degraded raid. It is the SLAVE node at the moment. 

My replacement procedure would be:

1. check the hardware revision (rev.2 on both nodes - so it's good)

2. check the UTM software / firmware version on both nodes (both must have the same utm firmware version) 

3. Shutdown the degraeded raid node (SLAVE node) via High Availability Overlay

4. Remove the SLAVE Node from the cluster configuration

5. Set the preferred master on current MASTER node 

6. Tick: "Enable automatic configuration of new devices"

7. Remove the degraded raid node from server rack and install replacement node

8. Only connect HA-NIC Interface and start replacement node

9. After successful HA-SYNC connect all other network cables 

Can you please evaluate and check the procedure? Thanks!



This thread was automatically locked due to age.
Parents
  • Here's the procedure I give to my clients:

    1. If needed, do a quick, temporary install so that the new device can download Up2Dates.
    2. Apply the Up2Dates to the same version as the current unit, do a factory reset and shutdown.
    3. On the current UTM in use, on the 'Configuration' tab of 'High Availability':
       a. Disable and then enable Hot-Standby
       b. Select eth3 as the Sync NIC
       c. Configure it as Node_1
       d. Enter an encryption key (I've never found a need to remember it)
       e. Select 'Enable automatic configuration of new devices'
        f. I prefer to use 'Preferred Master: None' and 'Backup interface: Internal'
    4. Cable eth3 to eth3 on the new device.
    5. Cable all of the other NICs exactly as they are on the original UTM.
    6. Power up the new device and wait for the good news. Wink

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hello Bob,

    thanks for your message. 

    The Sophos UTM Cluster is running fine.

    My procedure yesterday:

    1. Checked Hardware Revision and Firmware Version first

    --> Here i had to downgrade the Replacement Node from 9.702 to 9.602 because the running Node in the cluster was running on: 9.602

    --> Downgraded via SUSI Stick (Sophos Smart Installer)

    2. Run factory reset on Replacement Node 

    3. Took Backup of Firewall (running cluster) and checked System-Health status

    4. Shutdown Node that have to be replaced (we call it Node DEAD)

    5. Removed Node DEAD from cluster, after shutdown completed

    4. Configured HA in the running MASTER Node (this is the node, that is ok)

    --> Tick: "enable automatic configuration of new devices"

    -->  Preferred Master: Node (1)

    --> SYNC NIC: eth3 

    6. Removed Node DEAD from server rack

    7.  Install Replacement Node in server rack and boot up (without any network links)

    8. Connect with laptop to Replacement Node via ETH0 (192.168.0.1) und configured HA

    --> Operation Mode: Automatic Configuration

    --> SYNC NIC: eth3 

    9. Connect replacement Node to running Master node via ETH3

    10. Wait until HA shows the replacement node as: UNLINKED

    11. Connect all other network cables to replacement node and wait until SYNCING is completed

    12. Now the cluster is running fine ;-)

Reply
  • Hello Bob,

    thanks for your message. 

    The Sophos UTM Cluster is running fine.

    My procedure yesterday:

    1. Checked Hardware Revision and Firmware Version first

    --> Here i had to downgrade the Replacement Node from 9.702 to 9.602 because the running Node in the cluster was running on: 9.602

    --> Downgraded via SUSI Stick (Sophos Smart Installer)

    2. Run factory reset on Replacement Node 

    3. Took Backup of Firewall (running cluster) and checked System-Health status

    4. Shutdown Node that have to be replaced (we call it Node DEAD)

    5. Removed Node DEAD from cluster, after shutdown completed

    4. Configured HA in the running MASTER Node (this is the node, that is ok)

    --> Tick: "enable automatic configuration of new devices"

    -->  Preferred Master: Node (1)

    --> SYNC NIC: eth3 

    6. Removed Node DEAD from server rack

    7.  Install Replacement Node in server rack and boot up (without any network links)

    8. Connect with laptop to Replacement Node via ETH0 (192.168.0.1) und configured HA

    --> Operation Mode: Automatic Configuration

    --> SYNC NIC: eth3 

    9. Connect replacement Node to running Master node via ETH3

    10. Wait until HA shows the replacement node as: UNLINKED

    11. Connect all other network cables to replacement node and wait until SYNCING is completed

    12. Now the cluster is running fine ;-)

Children
No Data