Hello community,
we have 4 hardware sensors running in our environment. Many more of them are waiting to be deployed. All of them have been set up with version 1.7.1-2263. The 4 sensors already in the field were working with no issues. On the 15th of October they all received the update to version 1.8.0-2366. 2 of them came back up and are online and are working with no issue. But 2 of them stayed offline in Sophos Central. Even through they show offline in Sophos Central we can ping them and can access the login page of the Appliance Management Console. The problem is when we try to login it errors with with the message "Invalid username or password". SSH access works fine with the same password. In the upgrade_progress.log file it shows that it completed the update to the new version. This file is identical on all 4 sensors. But in the syslog file we see the following error message on the 2 not working sensors.
Oct 15 00:05:54 localhost ndrsensorapi[1980653]: Network settings are not configured correctly Oct 15 00:05:54 localhost ndrsensorapi[1980653]: cni0 Interface is not available, falling back to localhost Oct 15 00:05:54 localhost ndrsensorapi[1980653]: Starting the Sensor API on localhost... Oct 15 00:05:54 localhost ndrsensorapi[1981175]: {"level":"info","message":"Reading Interface Mapping","timestamp":"2024-10-15T00:05:54Z"}
On the working sensors the message displays.
Oct 15 00:07:17 localhost ndrsensorapi[1465992]: 5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000 Oct 15 00:07:17 localhost ndrsensorapi[1465899]: cni0 Interface is up Oct 15 00:07:17 localhost ndrsensorapi[1465899]: Starting the Sensor API Server on cni0... Oct 15 00:07:17 localhost ndrsensorapi[1465996]: {"level":"info","message":"Reading Interface Mapping","timestamp":"2024-10-15T00:07:17Z"}
The 2 broken sensors also show version 2.2.0 in the datalake. Probably a default value or something like that.
Since they were all working before the update and there was no change in the config our guess is that it has to do with the update.
Anyone else having this issues? Is there a known way to fix this? Maybe a way to restart the ndrsensorapi? Our guess is that a restart will fix them but we also don't want to make the issue worse since they are in the field and we would have to drive there if they become unavailable via SSH.
UPDATE 18.10.2024
Both broken sensors came back up online around 6:30pm yesterday. Both of them received some kind of update just before they came online again. Our guess is that the update triggered a restart of the ndrsensorapi service. The other 2 sensors also received the same update but didn't restart the ndrsensorapi service. So this may just be coincidence. So probably restarting the ndrsensorapi service fixes the issue but we are still waiting to hear back from our Sophos Partner and Sophos Support. Once we receive updated information I will document this here in case anyone else is facing the same issue.
Oct 17 18:26:46 localhost systemd[1]: Starting Update APT News... Oct 17 18:26:46 localhost systemd[1]: Starting Update the local ESM caches... Oct 17 18:26:46 localhost systemd[1]: apt-news.service: Succeeded. Oct 17 18:26:46 localhost systemd[1]: Finished Update APT News. Oct 17 18:26:46 localhost systemd[1]: esm-cache.service: Succeeded. Oct 17 18:26:46 localhost systemd[1]: Finished Update the local ESM caches. Oct 17 18:26:47 localhost dbus-daemon[800]: [system] Activating via systemd: service name='org.freedesktop.PackageKit' unit='packagekit.service' requested by ':1.3059' (uid=0 pid=3174970 comm="/usr/bin/gdbus call --system --dest org.freedeskto" label="unconfined") Oct 17 18:26:47 localhost systemd[1]: Starting PackageKit Daemon... Oct 17 18:26:47 localhost PackageKit: daemon start Oct 17 18:26:47 localhost dbus-daemon[800]: [system] Successfully activated service 'org.freedesktop.PackageKit' Oct 17 18:26:47 localhost systemd[1]: Started PackageKit Daemon. Oct 17 18:26:52 localhost systemd[1]: Stopping Start/stop NDR Sensor API... Oct 17 18:26:52 localhost systemd[1]: ndrsensorapi.service: Killing process 1981184 (ndrsensorapi) with signal SIGKILL. Oct 17 18:26:52 localhost systemd[1]: ndrsensorapi.service: Succeeded. Oct 17 18:26:52 localhost systemd[1]: Stopped Start/stop NDR Sensor API. Oct 17 18:26:52 localhost systemd[1]: Started Start/stop NDR Sensor API.
UPDATE 18.10.2024
[edited by: Jens Frankiewicz at 9:10 AM (GMT -7) on 18 Oct 2024]