Since November an increasing number of endpoints is reported from Central with "Sophos Firewall SN reported computer not sending heartbeat signals"
We upgraded our HQ XG from 18.5.4 to 19.0.1 on Nov 12th but the issue started already before as you can see from the screenshots.
Before that, we only received this alerts occasionally. Sometimes the message comes multiple times per day for a machine, then a few days no message is created even if the computer is still in use.
What is the issue here?
Central Region is Central Europe
Are you able to see any similar errors in the logs located at "C:\ProgramData\Sophos\Heartbeat\Logs"?
Could the device be entering a hibernate or sleep state at the times when these events are generated?
I was on the computer and it was in standby.
I could see the Intel Networkdriver was frequently dumping something all the time during standby.
Netwtw1070267026 - Dump after return from D3 after cmdNetwtw1070257025 - Dump after return from D3 before cmd
Probably causing network flapping which triggers Heartbeat Change.
In the heartbeat log I could see many, many events during standby mode: network has changed - firewall may disconnect
2022-11-16T09:21:38.596Z [ 5212: 6340] A Sending network status2022-11-16T09:21:38.596Z [ 5212: 6340] A The network status has changed, the Firewall may disconnect.2022-11-16T09:21:38.598Z [ 5212: 6340] A Connection closed (network error).
I updated (network) drivers and BIOS at first place and will monitor the situation.
Can the heartbeat module be tweaked so that it is compatible with Standby?
Everyone taks about saving energy - would be non-pc to disable standby for heartbeat to work.
I was able to get some additional feedback on this from our team.
The decision-making process behind when these alerts are generated will take place entirely on the firewall. Only if network traffic continues to be routed to the firewall without heartbeat traffic periodically, will the alert be generated.
Do you know if the NIC on the affected device remains active/communicating on the network while the system is in hibernate mode? What could also help is checking the power saver settings in Device Manager to check if the NIC is configured to stop communicating when the device enters a sleep state.
There are a couple of options available from the XG Console which can limit the frequency at which these alerts/events are generated in Sophos Central. I will follow up with you via PM to share these.
Hi Qoosh and thanks for your message.
The NICs, LAN or WiFi, may be shot down by the OS to save energy. That option is not disabled.
As written earlier, it looks to me like the NIC driver does some usless behaviour / dumping something that shows with those Netwtw10 Events every few seconds. That probably wakes the NIC, sends some packets the firewall sees when the heartbeat driver already sent info to the firewall, that the device is now off.
After the driver and BIOS updates, the issue has not happened to the computer that was most frequent before. Probably Intel / Fujitsu changed something on the network behaviour in Hibernate.
Will continue to monitor the behaviour on our side.
Interestingly the events almost stopped completely after Nov 16th.
Thanks for following up with us here. Hopefully, this will provide some further insight for others that may encounter similar issues or have these same concerns.
Hey guys,I'm facing this same issue, can I also get the info on how to update the options on the XG console for the frequency of the alerts ?
Also, to add a little more on this, I have 3 sites, HQ with 3300 and HA ( always had HA Active and Passive ) and two remote sites with 2100 and no HA, I've got two more 2100 boxes to have HA on the remote sites, and I did upgrade to SFOS 19.0.1 MR-1-Build365 a few days prior to activating the HA on the sites, and I've started to get this alerts on a remote site only after I've enabled the HA on this site, I still have one site with a 2100 with no HA and this site is not reporting any missing heartbeats. And the site not reporting problems have more users than the one that is reporting... And lastly I like to report that no changes have been made on the client computers, no updates ( we do this once a month ) no changes in configs and etc... All client computers are the same Dell 3420 ( recently replaced all computers on site ) with Windows 11 and all have the same settings, and also, the site that still does not have HA have the same computers and the settings as it was all replaced at the same time...
for now my understanding is that there is something to do with HA enabled or not LHerzog Do you have HA too ?
we have HA. But this situation is always complicated because you have uncontrolled Sophos Endoint Updates and Windows Updates also. In our case it may also have to do with November updates of Windows.
The Firewall Upgrade to 19.0.1 and the release of November Update were about at the same time.
Strange is, that it completely stopped as can be seen above. And I only did the driver updates on one client machine. That does point a bit to Sophos Endpoint changes made in the backend.
There is currently a Bug ID under investigation: NC-111152 - Missing Heartbeat behavior for endpoints generating alerts in Central
it came back. I suspect the Sophos endpoints getting Program updates since monday and that causing the issues. Unfortunately it is impossible to see that easily from Sophos Central. There is only generic "Update succeeded". You need to dig through the log files on the endoints. I don't like that.
Those Mails are coming more frequently now and it is annoying. Will Sophos Fix NC-111152?
@sophos, you'll never fix it, won't you?
Those alerts when the computers go to hibernate / sleep are so frequent and useless that everyone is ignoring them. So I'm going to disable them. Good work.
Forgot to ask, did you adjust those parameters on the firewall?
console> system synchronized-securitycentral_registration delay-missing-heartbeat-detection suppress-missing-heartbeat-to-centralconsole> system synchronized-security
yes, I already increased at least some of them. Cannot login to verify currently due to SSH authentication issues - other post in FW forum.
firewall is constantly reporting some computers with missing heartbeat. if I check them, they are currently not online on our network because they sleep.
And it is not a firewall issue from my point of view - the endpoint heartbeat agent should be able to see the upcoming event of windows entering hibernate and then quickly report to the firewall that it will temporarily disconnect from heartbeat.
Because this is not happening, the firewall just sees a abandoned heartbeat session and correctly reports the issue to central.
Tweaking delays will not change this when the client (usually) is in hibernate state longer than the delay.
The reason of those hits are quite simple: In Hibernation, the client will still send data. But the daemon of Sophos is already closed, as the sub system is already shut down. So the firewall will see the data and will react to the data as interpret them as missing heartbeat.
If you switch to hibernate, likely the endpoint cannot fetch this information not fast enough until windows will shut down the system. Therefore the missing heartbeat is generated.
The ID above is something, where likely the client is not in hibernate, instead the daemon is shutdown for whatever reason.
By adding the delay value on the firewall, you likely will decrease those alerts, as the firewall will give some time to the client ip until it will react to it. This will lead to a better reporting, as most windows clients will stop interacting with the gateway after 1-2 minutes going into hibernation. .
it is probably something that is also in combination with the NIC Vendor. As it is standard Intel NIC on business Notebooks in our case, this should be quite common. The screenshots from event viewer above show something is happening on the NIC when it is in hibernate. This may revive heartbeat or cause some other issues with that. Anyway my opinion is that this should work unless a customer uses exotic hardware with old drivers. Not a solution is to disable hibernate or disable energy saving feature on the NIC.
If you do a tcpdump / packet capture on the IP and do the hibernation, what kind of traffic do you still see? And maybe you will find the reason by researching this traffic further.
DEV has some binary ready for NC-111152, I would recommend you to open a case with Support, and you can mention about NC-111152, the case would get to GES and they can confirm is your issue matches NC-111152 and install the binary, to see if this resolves your issue.
IF you do this, share the Case ID.
Thank you emmosophos - is that binary for endpoint or firewall?