Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

Sophos XGS v20.0.2 - Heartbeat service dead - Decryption of passphrase is failed

Hello,

we performed a firewall migration from an XG450 model to the XGS4500 model last weekend. The firewalls are in a HA configuration. The migration process worked seamlessly. The primary firewall is working with no issue, all services started. To make sure that the HA failover and everything is connected correctly we performed a switch to the auxiliary device. This also worked without issues except for the heartbeat service not starting.

Steps taken to resolve the issue.

1. Tried to restart it via UI by turning the heartbeat function off an on. (System->Sophos Central->Security Heartbeat)

2. Tried to restart it via cli.

XGS4500_AM02_SFOS 20.0.2 MR-2-Build378 HA-Primary# service heartbeat:status -ds nosync
200 DEAD
XGS4500_AM02_SFOS 20.0.2 MR-2-Build378 HA-Primary# service heartbeat:start -ds nosync
503 Service Failed
XGS4500_AM02_SFOS 20.0.2 MR-2-Build378 HA-Primary# service heartbeat:status -ds nosync
200 STOPPED

What we found in the heartbeatd.log is that it is having issues with the decryption of a passphrase.

[2024-11-18 13:02:43.295Z] INFO HbdModuleBuilder.cpp[23674]:202 initLogger - Word size of architecture: 64
[2024-11-18 13:02:43.295Z] INFO HbdModuleBuilder.cpp[23674]:203 initLogger - Heartbeat daemon build time: 15:19:37 Jul  5 2024
[2024-11-18 13:02:43.295Z] INFO HbdModuleBuilder.cpp[23674]:97 intializeAndRunHbd - Heartbeat daemon starting
[2024-11-18 13:02:44.030Z] INFO EndpointStorage.cpp[23674]:37 EndpointStorage - Working with persistent endpoint storage
[2024-11-18 13:02:44.030Z] INFO EndpointStorage.cpp[23674]:39 EndpointStorage - Calling EndpointStorageBackend::get_all_endpoints
[2024-11-18 13:02:44.042Z] ERROR HBSessionHandler.cpp[23674]:261 dbCallbackEncryptedPassphrase - Decryption of passphrase is failed
[2024-11-18 13:02:44.042Z] FATAL HbdModuleBuilder.cpp[23674]:143 intializeAndRunHbd - Password missing to decrypt the key
[2024-11-18 13:02:44.042Z] INFO HbdModuleBuilder.cpp[23674]:148 intializeAndRunHbd - Heartbeat daemon halted

Steps taken during the migration:

1. Backup of the XG450. (v20.0.1)(System->Backup&firmware->Backup now -> Download)
2. Starting the XGS4500. (Update to latest firmware v20.0.2, connect to Sophos Central, apply licenses)
3. Shutdown of the XG450.
4. Import of the Backup from the XG450.
5. Reconnect to Sophos Central. (Import removes that configuration)
6. Setup of HA.
7. Add firewall to a group in Sophos Central. (Sync all settings from Sophos Central)

Does anyone have had this issue before and has an idea on how to fix this?



Added TAGs
[edited by: Raphael Alganes at 1:48 PM (GMT -8) on 18 Nov 2024]
Parents
  • I would recommend to deregister the cluster after the backup/restore and then join the cluster again to Central.

    __________________________________________________________________________________________________________________

  • The heartbeat service is running with no issue now but it created a new problem. The heartbeatd.log ist getting flooded with "Cannot create ID for application, because appId range is exhausted. Application will be ignored.".

    I found a different discussion with the same issue.  heartbeat log: Cannot create ID for application, because appId range is exhausted. Application will be ignored.

    You recommended to turn on the automatic cleanup of the application database which we already have enabled. It shows that we only have 849 applications.

    In that other discussion they mentioned a way to flush the database manually by deleting from 4 different database tables. Before I use them I wanted to ask if the commands for that are still valid?

  • we've had a case open for that. They can perform DB commands to flush old app entries.
    for your reference: 06348145 / Firewall Heartbeat: appId range is exhausted

    suggest to open a case and let them do the task

  • If anyone else is facing this issue, the commands described in the previous mentioned discussion are still valid.

    This issue should only occur if you have a firewall which was running a version prior to v20.

    Commands to wipe the registered applications from the database:

    psql -U pgroot -d corporate -c "DELETE FROM tbleacapplications"

    psql -U pgroot -d corporate -c "DELETE FROM tbleacappcache"

    psql -U pgroot -d corporate -c "DELETE FROM tblappstoeps"

    psql -U pgroot -d corporate -c "DELETE FROM tbleacendpoints"

    service heartbeat:restart -ds nosync


    Keep in mind that this will affect your live heartbeat authenticated users.

    For reference our case id was 02029324.

  • today I see this on our firewall again

    [2024-12-17 07:41:34.286Z] INFO SacProcessor.cpp[19872]:64 discardApp - Sent switchOffConnectionInfo request to endpoint: <3d9917c9-309c-4832-b2f9-22449f9897b5>, Application path :C:\134program files\134mozilla firefox\134firefox.exe
    [2024-12-17 07:41:34.521Z] ERROR SacProcessor.cpp[19872]:100 handleApp - Cannot create ID for application, because appId range is exhausted. Application will be ignored.
    [2024-12-17 07:41:34.911Z] ERROR SacProcessor.cpp[19872]:100 handleApp - Cannot create ID for application, because appId range is exhausted. Application will be ignored.

    I wouldn't expect this to happen at this No# of apps: 6353

  • You can check the database with the following command.

    psql -U pgroot -d corporate -c "SELECT count(*) FROM tbleacapplications"

    It should print you the same number as the GUI. My guess, it will show you more then 10000 resulting in that error message. If it shows more then 10000, the only solution I'm aware of is the manual cleanup of the database with the commands provided by support.

Reply
  • You can check the database with the following command.

    psql -U pgroot -d corporate -c "SELECT count(*) FROM tbleacapplications"

    It should print you the same number as the GUI. My guess, it will show you more then 10000 resulting in that error message. If it shows more then 10000, the only solution I'm aware of is the manual cleanup of the database with the commands provided by support.

Children