This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

multiple rrdtool high (100%) cpu usage

Since 02:20 this morning 3 seperate systems I look after for friends have shown this problem.

Each is running at 100% CPU load and the FW is slow

After SSH'ing in I found that there are many (20+) instances of rrdtools running.

This problem looks identical to  rrdtool high cpu usage

I have tried what is suggested there which kills the rrdtool task but after a while the instances start again so I have commented out the lines in /etc/crontab.rrd for now

What is the permanent solution to this ?

Jeff



This thread was automatically locked due to age.
Parents
  • We are a support company and we have several hundred devices with several thousand users. All of which are effected

    Edit: All devices are running 9.714.004

    All appliances are set to Europe / London. We have had to manually login via WebAdmin and force a reboot, which is painfully slow taking 10-15 mins per device due to 100% CPU usage on rrdtool (multiple instances)

    On those devices that SSH is enabled, you can SSH in and..
    su root
    restart -r -t 1 now

    to force a reboot

    We have now been rebooting devices for over 10 man hours of support.

    Please note we have not had to change the time zone on any device, those devices that have been rebooted are acting normally

Reply
  • We are a support company and we have several hundred devices with several thousand users. All of which are effected

    Edit: All devices are running 9.714.004

    All appliances are set to Europe / London. We have had to manually login via WebAdmin and force a reboot, which is painfully slow taking 10-15 mins per device due to 100% CPU usage on rrdtool (multiple instances)

    On those devices that SSH is enabled, you can SSH in and..
    su root
    restart -r -t 1 now

    to force a reboot

    We have now been rebooting devices for over 10 man hours of support.

    Please note we have not had to change the time zone on any device, those devices that have been rebooted are acting normally

Children
  • Before Sophos Support came back to me as i was facing a complete outage, i followed the above of running the following commands on initially the slave node, followed by the master node. 

    killall /usr/local/bin/create_rrd_graphs.plx
    killall rrdtool

    CPU dropped off immediately when both commands were run on both nodes

    I didnt amend crontab as i wasnt seeing the rrdtool proccess respawn over an hour of monitoring the processes, and was prepared to re-run the 2 commands when needed.

    Spoke to Sophos support 2 hours after logging call, and they recommended, change the timezone and reboot or kill rrdtool.

    I am scheduling a change this evening out of hours to amend time zone to UTC and rebooting both nodes incase.

    As per Thomas, the nodes seem to be ok at the moment, and time zone is still set to Europe/London.

    Is this issue only present when daylight savings change, and on firmware 9.714-4?  Therefore the next potential issue would be in the autumn if you were running this firmware still or is the rrdtool process respawned automatically during the day and this then triggers multiple rddtool processes, unless you amend timezone to UTC?