This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HitmanPro.Alert service IO brought VMWare cluster to a standstill

Hi,

We just had an incident where all the servers on a customers VMWare cluster were almost unusable. I have tracked down the issue to high write IO generated by the "HitmanPro.Alert service" to a file "excalibur.db-wal". The files in that folder look like this:

 Directory of C:\ProgramData\HitmanPro.Alert

17/12/2020  03:21 PM    <DIR>          .
17/12/2020  03:21 PM    <DIR>          ..
16/12/2020  07:48 AM    <DIR>          drop
17/12/2020  02:31 AM     5,692,821,504 excalibur.db
17/12/2020  03:18 PM         4,718,592 excalibur.db-shm
17/12/2020  03:21 PM     2,420,450,592 excalibur.db-wal
16/12/2020  07:17 AM             9,737 hmpalert.bf
17/12/2020  01:53 PM    <DIR>          Logs
09/12/2020  03:14 AM    <DIR>          MCS
02/12/2020  08:11 AM    <DIR>          reports
               4 File(s)  8,118,000,425 bytes

As soon as I disabled Tamper Protection and stopped the service, performance immediately returned to normal across the entire cluster. I could probably leverage IO QoS under VMWare to mitigate the effect of this across the cluster, but obviously something has gone wrong with HitmanPro...

Anecdotal forum posts suggest that if I delete the excalibur.* files and restart the service the problem might go away, but I haven't tested this yet - the users have work to do! The server was only rebooted this morning too.

Any comments/suggestions?

James



This thread was automatically locked due to age.
Parents Reply
  • Hi,

    Thanks for responding, I had reviewed that forum and the issue reported there was space rather than the performance issue our customer had. In our case the db files were only around 8GB, so we weren't at risk of running the server out of space, but the IOPS was exceeding the capacity of the SAN so it impacted all servers.

    I will be cleaning up those files and restarting the service once the users go home for the day, but I can't do that now as if it stops the cluster again during business hours we're going to have a very unhappy customer.

    I will follow up here once I have completed these steps.

    Thanks

    James

Children
  • Thank you James, Also when you perform the checking kindly validate as well the if HitmanPro is taking your system resource? Any Changes being implemented this past days before you observed this behavior? Was there a new application running on your VM cluster? 

    Glenn ArchieSeñas (GlennSen)
    Global Community Support Engineer

    The New Home of Sophos Support Videos!  Visit Sophos Techvids