This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Heartbeat is missing or at risk random endpoints

Hi,

We are having heartbeat issues. Several times a day, computers report missing or at risk status. But this is because endpoints are unable to communicate with the Firewall or something.

At the endpoints we have already reinstalled Sophos central and have windows 10 professional always up to date.

We opened a ticket with Sophos, which instructed us to re-image. But almost every case we open tells us to reimage it. We cannot believe that this is indeed the solution.
We are in version 17.5.9.

Similar or similar errors occur in new installations that we have already made and are operating normally. So we are not sure if the errors are related.
They also talk about core dumps, but by date they are old .. it's from other firmwares.

After Sophos testing, we now have no endpoints connected to Heartbeat .. at least the console does not show.

I would like some indications of how we can proceed in this case, with the experience of colleagues the only way out is to make reimage?

 

This was the return of Sophos support:

 

- Database-related services have been restarted:

XG115_XN03_SFOS 17.5.9 MR-9# service postgres:restart -ds nosync
200 OK
XG115_XN03_SFOS 17.5.9 MR-9# service sigdb:restart -ds nosync
503 Service Failed
XG115_XN03_SFOS 17.5.9 MR-9# service reportdb:restart -ds nosync
503 Service Failed
XG115_XN03_SFOS 17.5.9 MR-9# service garner:restart -ds nosync
200 OK
XG115_XN03_SFOS 17.5.9 MR-9# service heartbeatd:restart -ds nosync
400 Service not found
XG115_XN03_SFOS 17.5.9 MR-9# service heartbeat:restart -ds nosync
200 OK
XG115_XN03_SFOS 17.5.9 MR-9#

 

- We checked the heartbeat log

gr_io: Broken pipe, Offset => 0
2019-11-27 15:21:39 INFO Main.cpp[23648]:140 initLogger - Heartbeat daemon build time: 16:17:07 Nov 1 2019
2019-11-27 15:21:39 INFO Main.cpp[23648]:219 main - Heartbeat daemon starting
2019-11-27 15:21:39 INFO Main.cpp[23648]:241 main - Maximum connected clients: 10000
2019-11-27 15:21:39 INFO EndpointStorage.cpp[23648]:41 EndpointStorage - Working with persistent endpoint storage
2019-11-27 15:21:39 INFO EndpointStorage.cpp[23648]:43 EndpointStorage - Calling EndpointStorageBackend::get_all_endpoints
2019-11-27 15:21:39 INFO Main.cpp[23648]:418 main - Heartbeat daemon running
2019-11-27 15:21:39 INFO EacEventReader.cpp[23648]:128 start - EacEventReader has been successfully started
2019-11-27 15:21:39 INFO Main.cpp[23648]:115 dropPrivileges - Privdrop to uid 5 with gid 1007 successful
2019-11-27 15:21:39 INFO Main.cpp[23648]:118 dropPrivileges - reduced capabilities: effective=net_admin, sys_resource, permitted=net_admin, sys_resource
2019-11-27 15:21:39 INFO Main.cpp[23648]:189 sendHeartbeatReadyOpcode - heartbeat_ready opcode sent.
2019-11-27 15:21:45 INFO ModuleEac.cpp[23648]:115 handOverEacState - Send EacSwitchRequest to all directly connected endpoints (state=1)
2019-11-27 15:28:04 INFO GarnerEventReader.cpp[23648]:129 acceptConnectionHandler - Garner plugin connected. Ready to receive garner events.
2019-11-27 15:28:04 ERROR ModuleStatus.cpp[23648]:111 update - mac address is invalid
2019-11-27 15:28:29 ERROR ModuleStatus.cpp[23648]:111 update - mac address is invalid
2019-11-27 15:28:31 ERROR ModuleStatus.cpp[23648]:111 update - mac address is invalid
2019-11-27 15:28:31 ERROR ModuleStatus.cpp[23648]:111 update - mac address is invalid
XG115_XN03_SFOS 17.5.9 MR-9#

 

- Garner's record has been verified:

===========================

nov 27 16:15:09: OPPOSTGRES: oppostgres_output: log event couldn't inserted
nvram_get failed with -12
ERROR Nov 27 16:15:09 [4123261760]: [SCM::get_is_password_random] '/bin/nvram get scm.RandomAdminPass' failed
ERROR Nov 27 16:15:09 [4123261760]: [SCM::who_was_killer] '/bin/nvram get scm.RandomAdminPass' terminated with exit code 244
nvram_get(): failed with -16
ERROR Nov 27 16:15:09 [4123261760]: [SCM::scm_get_expire_days] scm_get_expire_days: lic_get_details failed for 'li.epsup'

nvram_get(): failed with -16
ERROR Nov 27 16:15:10 [4123261760]: [SCM::scm_get_module_status] scm_get_module_status: lic_get_details failed for 'li.epsup'

==================================

- We see the Heartbeat

- we restart log settings

- We consulted the use of hd, we see that it has 86% used

XG115_XN03_SFOS 17.5.9 MR-9# df -h
Filesystem Size Used Available Use% Mounted on
rootfs 301.5M 2.6M 279.0M 1% /
df: /newroot: No such file or directory
df: /newroot/dev: No such file or directory
df: /newrootrw: No such file or directory
none 301.5M 2.6M 279.0M 1% /
none 1.9G 36.0K 1.9G 0% /dev
none 1.9G 36.2M 1.8G 2% /tmp
none 1.9G 14.7M 1.8G 1% /dev/shm
/dev/conf 385.4M 74.4M 311.0M 19% /conf
/dev/content 5.6G 384.7M 5.2G 7% /content
/dev/var 46.6G 40.3G 6.3G 86% /var
XG115_XN03_SFOS 17.5.9 MR-9#


- We can identify some colors dumps

xrwx 2 root 0 4.0K Apr 13 2019 .
drwxr-xr-x 37 root 0 4.0K Nov 27 15:20 ..
-rw------- 1 root 0 482.1M Apr 13 2019 core.avd
-rw------- 1 root 0 35.2M Oct 20 2018 core.awed
XG115_XN03_SFOS 17.5.9 MR-9#


Looking at LOGs, nvram and core failures, we recommend Re-Image

 

Thanks

 


This thread was automatically locked due to age.
Parents Reply Children
No Data