Every few weeks, we lose the ability to SSH to one of our CentOS 8 Linux servers. Monitoring shows that the system load increases to unreasonable levels and systemd-logind stops responding.
Here are the last messages from /var/log/messages prior to having to reboot.
Dec 9 12:42:15 rml-dev06 systemd[1]: Starting "Sophos Anti-Virus update"...
Dec 9 12:42:23 rml-dev06 savd[1203]: update.updated: Updating from versions - SAV: 10.5.2, Engine: 3.79.0, Data: 5.80
Dec 9 12:42:23 rml-dev06 savd[1203]: update.updated: Updating Sophos Anti-Virus....#012Updating SAVScan on-demand scanner#012Updating Virus Engine and Data#012Updating Manifest#012Update completed.
Dec 9 12:42:23 rml-dev06 savd[1203]: update.updated: Updated to versions - SAV: 10.5.2, Engine: 3.79.0, Data: 5.80
Dec 9 12:42:23 rml-dev06 savd[1203]: update.updated: Successfully updated Sophos Anti-Virus from sdds:SOPHOS
Dec 9 12:42:23 rml-dev06 systemd[1]: Started "Sophos Anti-Virus update".
I don't have the screenshots of the system load and CPU usage but around 12:42:30, there was a brief spike to 100% CPU usage for the sav-protect service (presumably corresponding to the update and restart logged above). Immediately after this the system load started climbing and didn't come back down until we rebooted. Over a few hours it reaches loads greater than 100.
Some processes keep running, e.g. webservers, but given that we can't login we have to reboot.
Any ideas what's causing this or how we can investigate further?
This thread was automatically locked due to age.