We'd love to hear about it! Click here to go to the product suggestion community
Hi,
We are having heartbeat issues. Several times a day, computers report missing or at risk status. But this is because endpoints are unable to communicate with the Firewall or something.
At the endpoints we have already reinstalled Sophos central and have windows 10 professional always up to date.
We opened a ticket with Sophos, which instructed us to re-image. But almost every case we open tells us to reimage it. We cannot believe that this is indeed the solution.We are in version 17.5.9.
Similar or similar errors occur in new installations that we have already made and are operating normally. So we are not sure if the errors are related.They also talk about core dumps, but by date they are old .. it's from other firmwares.
After Sophos testing, we now have no endpoints connected to Heartbeat .. at least the console does not show.
I would like some indications of how we can proceed in this case, with the experience of colleagues the only way out is to make reimage?
This was the return of Sophos support:
- Database-related services have been restarted:
XG115_XN03_SFOS 17.5.9 MR-9# service postgres:restart -ds nosync200 OKXG115_XN03_SFOS 17.5.9 MR-9# service sigdb:restart -ds nosync503 Service FailedXG115_XN03_SFOS 17.5.9 MR-9# service reportdb:restart -ds nosync503 Service FailedXG115_XN03_SFOS 17.5.9 MR-9# service garner:restart -ds nosync200 OKXG115_XN03_SFOS 17.5.9 MR-9# service heartbeatd:restart -ds nosync400 Service not foundXG115_XN03_SFOS 17.5.9 MR-9# service heartbeat:restart -ds nosync200 OKXG115_XN03_SFOS 17.5.9 MR-9#
- We checked the heartbeat log
gr_io: Broken pipe, Offset => 02019-11-27 15:21:39 INFO Main.cpp[23648]:140 initLogger - Heartbeat daemon build time: 16:17:07 Nov 1 20192019-11-27 15:21:39 INFO Main.cpp[23648]:219 main - Heartbeat daemon starting2019-11-27 15:21:39 INFO Main.cpp[23648]:241 main - Maximum connected clients: 100002019-11-27 15:21:39 INFO EndpointStorage.cpp[23648]:41 EndpointStorage - Working with persistent endpoint storage2019-11-27 15:21:39 INFO EndpointStorage.cpp[23648]:43 EndpointStorage - Calling EndpointStorageBackend::get_all_endpoints2019-11-27 15:21:39 INFO Main.cpp[23648]:418 main - Heartbeat daemon running2019-11-27 15:21:39 INFO EacEventReader.cpp[23648]:128 start - EacEventReader has been successfully started2019-11-27 15:21:39 INFO Main.cpp[23648]:115 dropPrivileges - Privdrop to uid 5 with gid 1007 successful2019-11-27 15:21:39 INFO Main.cpp[23648]:118 dropPrivileges - reduced capabilities: effective=net_admin, sys_resource, permitted=net_admin, sys_resource2019-11-27 15:21:39 INFO Main.cpp[23648]:189 sendHeartbeatReadyOpcode - heartbeat_ready opcode sent.2019-11-27 15:21:45 INFO ModuleEac.cpp[23648]:115 handOverEacState - Send EacSwitchRequest to all directly connected endpoints (state=1)2019-11-27 15:28:04 INFO GarnerEventReader.cpp[23648]:129 acceptConnectionHandler - Garner plugin connected. Ready to receive garner events.2019-11-27 15:28:04 ERROR ModuleStatus.cpp[23648]:111 update - mac address is invalid2019-11-27 15:28:29 ERROR ModuleStatus.cpp[23648]:111 update - mac address is invalid2019-11-27 15:28:31 ERROR ModuleStatus.cpp[23648]:111 update - mac address is invalid2019-11-27 15:28:31 ERROR ModuleStatus.cpp[23648]:111 update - mac address is invalidXG115_XN03_SFOS 17.5.9 MR-9#
- Garner's record has been verified:
===========================
nov 27 16:15:09: OPPOSTGRES: oppostgres_output: log event couldn't insertednvram_get failed with -12ERROR Nov 27 16:15:09 [4123261760]: [SCM::get_is_password_random] '/bin/nvram get scm.RandomAdminPass' failedERROR Nov 27 16:15:09 [4123261760]: [SCM::who_was_killer] '/bin/nvram get scm.RandomAdminPass' terminated with exit code 244nvram_get(): failed with -16ERROR Nov 27 16:15:09 [4123261760]: [SCM::scm_get_expire_days] scm_get_expire_days: lic_get_details failed for 'li.epsup'
nvram_get(): failed with -16ERROR Nov 27 16:15:10 [4123261760]: [SCM::scm_get_module_status] scm_get_module_status: lic_get_details failed for 'li.epsup'
==================================
- We see the Heartbeat
- we restart log settings
- We consulted the use of hd, we see that it has 86% used
XG115_XN03_SFOS 17.5.9 MR-9# df -hFilesystem Size Used Available Use% Mounted onrootfs 301.5M 2.6M 279.0M 1% /df: /newroot: No such file or directorydf: /newroot/dev: No such file or directorydf: /newrootrw: No such file or directorynone 301.5M 2.6M 279.0M 1% /none 1.9G 36.0K 1.9G 0% /devnone 1.9G 36.2M 1.8G 2% /tmpnone 1.9G 14.7M 1.8G 1% /dev/shm/dev/conf 385.4M 74.4M 311.0M 19% /conf/dev/content 5.6G 384.7M 5.2G 7% /content/dev/var 46.6G 40.3G 6.3G 86% /varXG115_XN03_SFOS 17.5.9 MR-9#
- We can identify some colors dumps
xrwx 2 root 0 4.0K Apr 13 2019 .drwxr-xr-x 37 root 0 4.0K Nov 27 15:20 ..-rw------- 1 root 0 482.1M Apr 13 2019 core.avd-rw------- 1 root 0 35.2M Oct 20 2018 core.awedXG115_XN03_SFOS 17.5.9 MR-9#
Looking at LOGs, nvram and core failures, we recommend Re-Image
Thanks
Hi Christovam
Sorry for the inconvenience caused! Thank you for the detailed post.It would be great if you could PM us the service request number, we can check the history and provide you further assistance
To have detail investigation on same you may start the "heartbeat" service in debug and trace mode to capture detail logs.command to start service in debug mode:
#service -t json -b '{"debug":"2"}' -ds nosync heartbeat:debug
To stop the debug you use below command :
# service -t json -b '{"debug":"0"}' -ds nosync heartbeat:debug
For any machine where you found any missing event or status change you may check the logs from hearbeatd.log file.
#grep " Connectivity changed for" /log/heartbeatd.log
Also on firewall you may keep packet capture running on below port and as soon as you get any notification for missing heartbeat you may check heartbeat log and below packets:
#tcpdump 'port 8347Also what notification coming on Sophos Central? Can you share the snapshot?
It appears that your report database and signature database service is failing to restart.
I recommend trying to reboot and recover the services to normal state. However please ensure you have a backup first in case the device goes into failsafe mode. You will be able to restore easily using that.
If the device comes up fine, check to see if any services are in a "DEAD" or "STOPPED" state by running command: service -S from the advance console output off a SSH session.
In regards to your original problem about endpoints going into a "missing" state for heartbeat, please check to see if the devices are going into hibernation/sleep mode at that specific time.
You should also only be receiving 1 notification per day per endpoint.
Will wait for your response.
Thanks!