Sophos XG Firewall: Limiting downtime when re-imaging devices in HA

Note: Please contact Sophos Professional Services if you require direct assistance with your specific environment.


Hi Community,

This thread outlines how to limit downtime when re-imaging Sophos XG hardware appliances that are configured in HA.

Warning: This process will still result in an outage. Please plan ahead accordingly to account for this downtime.

Preparation

First, download the latest firmware installation image from Sophos Licensing Portal https://www.sophos.com/mysophos

If the firmware version you are looking is unavailable via Sophos Licensing Portal and you prefer to stay on that version, please contact Sophos Technical Support to request for this image. It may take at least two business days for this.

Note: Sophos Support recommends to use latest firmware version. More info: Sophos XG Firewall Release Notes & News

The steps below, assume the following:

  • Node 1 is the current HA primary Node, and it is also the initial HA primary Node.
  • Node 2 is the current HA auxiliary Node.

To check if an XG firewall in HA is the initial HA primary Node:

  1. Log on XG firewall SSH terminal using admin account. Once authenticated, you will be presented with the Sophos Firewall console menu
  2. Go to 5. Device Management > 3. Advanced Shell
  3. Run the following commands:
    • nvram get "#li.serial"
      • The serial number of the XG firewall is then displayed
    • nvram get "#li.master"
      • if output of nvram get "#li.master" is YES as shown below, then the XG firewall is the initial HA primary Node:
        XG210_WP02_SFOS 17.5.9 MR-9# nvram get "#li.master"
        YES
    • Note: The command nvram get "#li.master" is only used on XG firewalls in active-passive HA, to identify which node is initial primary node. There is no concept of initial primary node in active-active HA.

Scenario 1: reimage HA auxiliary node and keep both XG firewalls on previous firmware version

  1. If XG firewall is registered to Sophos Central, please deregister. Then go to Sophos Central to make sure XG firewall is removed.
  2. Download configuration backup from Node 1, and save to local computer.
  3. On Node 1 webadmin GUI, disable HA
    • If Node 2 is connected to Node 1, once HA is disable on Node 1, Node 2 will reboot with factory default settings, except admin password, and peer administration IP.
    • Don't disable HA on Node 2 webadmin GUI.
  4. Reimage Node 2 to same firmware version of Node 1. Here is guide for reimage.
  5. Initialize Node 2, configure WAN interface and connect cable on it, to allow it access Internet. Don't configure any LAN or DMZ interface yet.
  6. Make sure Node 2 has same pattern and hotfix version as Node 1. Click here for details.
  7. Make sure msync service is UNTOUCHED or STOPPED on both Node 1 and Node 2. Click here for details.
  8. Configure active-passive HA as per KBA "How to configure High Availability".
    • Make sure Node 1 is configured as the primary Node, and Node 2 as auxiliary node.
    • Note: An outage will occur.

Scenario 2: reimage HA auxiliary node and upgrade both XG firewalls to latest firmware

  1. If XG firewall is registered to Sophos Central, please deregister. Then go to Sophos Central to make sure XG firewall is removed.
  2. Download configuration backup from Node 1, and save to local computer.
  3. on Node 1 webadmin GUI, perform HA failover, to make Node 2 primary node.
    • on v18, it is "Switch to passive device", as below

    • on v17, it is "Put to Standby"
    • Note: An outage will occur.
  4. Reimage Node 1 with latest firmware. Here is guide for reimage.
  5. Initialize Node 1, configure WAN interface and connect cable on it, to allow it access Internet. Don't configure any LAN or DMZ interface yet.
  6. Unplug all cables from Node 1, except the one connecting to your laptop.
  7. Restore configuration to Node 1.
  8. Cut traffic over from Node 2 to Node 1.
    • Note: 2nd outage will occur.
  9. Reimage Node 2 with the same firmware as Node 1.
  10. Initialize Node 2, configure WAN interface and connect cable on it, to allow it access Internet. Don't configure any LAN or DMZ interface yet.
    • Register Node 2 serial number if it has not been done.
  11. Make sure Node 2 has same pattern and hotfix version as Node 1. Click here for details.
  12. Make suremsync service is UNTOUCHED or STOPPED on both Node 1 and Node 2. Click here for details.
  13. Configure active-passive HA as per KBA "How to configure High Availability".
    • Make sure Node 1 is configured as the primary Node, and Node 2 as auxiliary node.
    • Note: 3rd outage will occur.

Scenario 3: reimage HA primary node

  1. If XG firewall is registered to Sophos Central, please deregister. Then go to Sophos Central to make sure XG firewall is removed.
  2. Download configuration backup from Node 1, and save to local computer.
  3. on Node 1 webadmin GUI, perform HA failover, to make Node 2 primary node.
    • Note: An outage will occur.
  4. Reimage Node 1 with same firmware as Node 2. Here is guide for reimage.
  5. Initialize Node 1, configure WAN interface and connect cable on it, to allow it access Internet. Don't configure any LAN or DMZ interface yet.
  6. Unplug all cables from Node 1, except the one connecting to your laptop.
  7. Restore configuration to Node 1.
  8. Cut traffic over from Node 2 to Node 1.
    • Note: 2nd outage will occur.
  9. Factory reset Node 2.
  10. Initialize Node 2, configure WAN interface and connect cable on it, to allow it access Internet. Don't configure any LAN or DMZ interface yet.
    • Register Node 2 serial number if it has not been done.
  11. Make sure Node 2 has same pattern and hotfix version as Node 1. Click here for details.
  12. Make sure msync service is UNTOUCHED or STOPPED on both Node 1 and Node 2. Click here for details.
  13. Configure active-passive HA as per KBA "How to configure High Availability".
    • Make sure Node 1 is configured as the primary Node, and Node 2 as auxiliary node.
    • Note: 3rd outage will occur.

Step 9 and 10 are to get Node 2 prepared for HA, and they are not necessary if you can properly re-configure IP address of all interfaces on Node 2.

Scenario 4: rebuilld HA after RMA of primary node

Assume the following

  • Primary Node 1 gets RMA, and RAM replacement has arrived.
  • Auxiliary Node 2 is running as HA standalone.

Here is steps to rebuilld HA for RMA of primary node

  1. Perform disk health (30 minutes) and memory test (2 or more hours) on RMA replacement.
  2. If XG firewall is registered to Sophos Central, please deregister. Then go to Sophos Central to make sure XG firewall is removed.
  3. Download configuration backup from Node 2, and save to local computer.
  4. Reimage RMA replacement with same firmware as Node 2. Here is guide for reimage.
  5. Initialize RMA replacement, configure WAN interface and connect cable on it, to allow it access Internet. Don't configure any LAN or DMZ interface yet.
  6. Restore configuration to RMA replacement
  7. Cut traffic over from Node 2 to RMA replacement.
    • Note: an outage will occur.
  8. Factory reset Node 2.
  9. Initialize Node 2, configure WAN interface and connect cable on it, to allow it access Internet. Don't configure any LAN or DMZ interface yet.
    • Register Node 2 serial number if it has not been done.
  10. Make sure Node 2 has same pattern and hotfix version as Node 1. Click here for details.
  11. Make sure HA is disabled on RMA replacement.
  12. Make sure msync service is UNTOUCHED or STOPPED on both RMA replacement and Node 2. Click here for details.
  13. Configure active-passive HA as per KBA "How to configure High Availability".
    • Make sure RMA replacement is configured as the primary Node, and Node 2 as auxiliary node.
    • Note: 2nd outage will occur.
  14. Transfer license from Node 1 to RMA replacement. Here is KBA for license transfer.

Step 8 and 9 are to get Node 2 prepared for HA, and they are not necessary if you can properly re-configure IP address of all interfaces on Node 2.

Regards,

Appendix

Check pattern and hotfix version

  • Please run Advanced Shell command
    cish -c "system diag sh ver"
  • The following pattern/hotfix needs to be matched on both HA nodes
    AP Firmware
    ATP
    Avira AV
    Authentication Clients
    Geoip ip2country DB
    IPS and Application signatures
    Sophos Connect Clients
    RED Firmware
    Sophos AV
    SSLVPN Clients
    Hot Fix version

Check msync service status

Please run Advanced Shell command
service -S | grep msync

Edition History

2021-09-07, fixed typo

2021-02-17, major update

2020-02-07, first edition



fixed typo
[edited by: taowang at 6:41 AM (GMT -7) on 7 Sep 2021]