This article provides general steps on what can be done on the Sophos UTM in the event of an emergency. The following sections are covered:
Applies to the following Sophos products and versions Sophos UTM
There are a number of things that can be done in the event of a total system failure, failure to boot (reboot cycle), or general lockout. Some depend on environment configuration, but many of the steps below will apply in most emergency situations.
If a production UTM won't boot, it's important to determine whether the issue is caused by hardware (physically defective, dead power supply/motherboard, etc), by software (such as a kernel problem or missing system files), or by configuration (issues with a license, etc).
Some older UTM firmware versions contain known issues that can cause a kernel panic, resulting in a UTM restarting itself. For this reason, it's always a good idea to ensure your UTM is updated to the latest version. If you can, please update first prior to engaging in any of the troubleshooting steps listed in this article.
In all emergency situations, you should contact Sophos Support immediately so that one of our engineers can assist you:
If you are completely locked out of the UTM (missing WebAdmin/SSH passwords), please see the following KB for instructions on password recovery:
Sophos UTM: How to recover access in the event of password loss
With only one UTM available, options for implementing a quick workaround are limited. If another gateway is available, a good first step would be to bypass the UTM so that the entire network isn't down. If that option isn't available, there is no other choice but to troubleshoot the issue on the UTM itself.
1. Check the console output while the UTM boots up
Connecting a monitor to the UTM and examining the output on boot-up is the first step to determining the cause of the issue. Depending on the output, it may be clear that the issue is software, or hardware related. Please contact support for help with interpreting the console output.
2. Restore a backup configuration
To ensure the issue isn't being caused by a configuration problem, you can restore a backup configuration from a known time when the UTM was working correctly. There are a few ways of doing so:
Restoring a backup from WebAdmin:
If you are able to access WebAdmin, you can restore a backup by heading to Management > Backup/Restore, and click on the green Restore icon next to an available backup.
Restoring a backup from console:
If your automatic backups are encrypted, please see the following KB:
Sophos UTM: How to restore an encrypted backup from the command line
For non-encrypted (default) backups, please see either of the following KBs:
Restoring a Backup from Command Line Handling UTM configuration backups via command line
Restoring a backup automatically:
If you have a backup config file (.abf) available or receive them automatically each week via email, you can restore a backup automatically when the UTM boots.
To do so, copy the .abf backup file to the root of a FAT32 formatted USB stick, and connect the USB to the UTM. On boot, the UTM should automatically restore the backup contained on the USB.
3. Run a factory reset
Factory resetting the device will return all configuration settings to the factory-default state. This can be helpful if the problem turns out to be related to configuration, and restoring a backup doesn't help (or if the only available backups contain settings that cause the failure).
Factory reset from WebAdmin:
If you can access WebAdmin, a factory reset can be run as follows:
Management > System Settings > Reset Configuration or Passwords > Run factory reset now
Factory reset from console:
Console access generally becomes available prior to other services, such as WebAdmin. If the restart occurs after console access is possible, but prior to WebAdmin, you can login to the console as root and factory reset the device by entering the following command:
How to reset Sophos UTM to factory settings
4. Re-image the device
If all other options fail, one option remains: reinstalling the UTM's operating system. The following KB article contains instructions on doing so:
Sophos UTM: How to re-image with a USB CD-ROM drive
With multiple UTMs, more options are available in terms of workarounds, especially because often only one node will experience an issue. The purpose of HA is to automatically bypass failed UTMs, so normally if the master node fails, another node will take over and you can attempt to recover it via the following steps:
1. Isolate the problem node and rejoin it to the cluster
Manually resyncing a node will clear out most configuration-related issues; doing so is a good first troubleshooting step.
First, disable HA on the master node in WebAdmin by browsing to Management > High Availability > Configuration, and setting Operation Mode to 'off'.
If you can't access WebAdmin, HA can also be disabled via the console by entering cc set ha status off. If the slave node is connected and online, it should factory reset the slave node and shut it down.
cc set ha status off.
Afterwards, power the node back on, and let it resync.
2. Factory reset the node
If the automatic factory reset fails, you'll have to manually reset the node. If you're able to log into the console prior to restart, you can run a manual factory reset by entering the following command (as root):
After "cc system_factory_reset" and connecting HA port, Slave node will re-join existing HA/cluster after rebooting.
Note: HA daemon can restart without any valid license after rebooting by "cc system_factory_reset"
3. Restore a backup configuration
If rejoining the node fails due to the licensing issue described above, you'll have to attempt to restore a backup to it first (containing the license file). This can be done using the same process as described above:
Sophos UTM: How to restore an encrypted backup from the command line Handling UTM configuration backups via command line
If you have a backup config file (.abf) available, or receive them automatically each week via email, you can restore a backup automatically when the UTM boots.
If the device is still inaccessible or rebooting itself after trying all of the above, it's likely some of the core operating system files became corrupted, and reimaging the device is the only alternative:
If the issue continues to occur:
Sometimes the only remaining option is to replace a unit under warranty. Please browse to the link below for instructions on contacting support to submit a request for RMA:
5. Restoring HA after receiving an RMA'd device.
If you've spotted an error or would like to provide feedback on this article, please use the section below to rate and comment on the article. This is invaluable to us to ensure that we continually strive to give our customers the best information possible.
Every comment submitted here is read (by a human) but we do not reply to specific technical questions. For technical support post a question to the community. Or click here for new feature/product improvements. Alternatively for paid/licensed products open a support ticket.