This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SUM core daemon not running - restarted

Hi,

We've been getting very frequent emails that SUM's core daemon has been restarted. As an attempt to resolve this situation, we created a new VMware instance of SUM 4.106-2, installed from scratch. The system didn't encounter any issues until recently, when it started to go through this loop of restarting the core daemon.

An excerpt from accd.log of a more recent restart:
2014:04:29-14:34:42 sum4vm accd: 1943907 [0xdf5e3b70] WARN  server.device.DeviceSession null - DeviceSession::clear() IO error during recv [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]

2014:04:29-14:34:42 sum4vm accd: 1943908 [0xddde0b70] ERROR libs.io.Session null - send attempted after previous error [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:42 sum4vm accd: 1943908 [0xddde0b70] WARN  server.device.DeviceSession null - DeviceSession::clear() IO error during sendDone [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:52 sum4vm accd: 1954132 [0xf2f54b70] WARN  server.device.DeviceCache null - DeviceCache::login() device is already connected 92[device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:52 sum4vm accd: 1954132 [0xdf5e3b70] WARN  server.device.DeviceSession null - DeviceSession::clear() IO error during recv [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:52 sum4vm accd: 1954132 [0xe95f7b70] ERROR libs.io.Session null - send attempted after previous error [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:52 sum4vm accd: 1954132 [0xe95f7b70] WARN  server.device.DeviceSession null - DeviceSession::clear() IO error during sendDone [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:35:02 sum4vm accd: 1964211 [0xe4deeb70] WARN  server.device.DeviceCache null - DeviceCache::login() device is already connected 92[device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:35:20 sum4vm accd: 13   [0xf571a720] INFO  server.accd null - Starting accd
2014:04:29-14:35:20 sum4vm accd: 16   [0xf571a720] INFO  server.accd null - Starting services
2014:04:29-14:35:20 sum4vm accd: 63   [0xf571a720] INFO  libs.store.DataValidator null - Created 10 validators in 1 items


The removed IP is the same in all entries.

selfmon.log:

2014:04:29-14:02:07 sum4vm selfmonng[4498]: I check Failed increment accd_running counter 1 - 3

2014:04:29-14:02:12 sum4vm selfmonng[4498]: I check Failed increment accd_running counter 2 - 3
2014:04:29-14:02:17 sum4vm selfmonng[4498]: W check Failed increment accd_running counter 3 - 3
2014:04:29-14:02:17 sum4vm selfmonng[4498]: SUM core daemon not running - restarted
2014:04:29-14:02:17 sum4vm selfmonng[4498]: W NOTIFYEVENT Name=accd_running Level=INFO Id=132 sent
2014:04:29-14:02:17 sum4vm selfmonng[4498]: W triggerAction: 'cmd'
2014:04:29-14:02:17 sum4vm selfmonng[4498]: W actionCmd(+):  '/var/mdw/scripts/accd restart'
2014:04:29-14:02:18 sum4vm selfmonng[4498]: W child returned status: exit='0' signal='0'


I haven't found much regarding this anywhere else on the forums, but whenever it crashes, all users are logged out of SUM and forced to log back in.


This thread was automatically locked due to age.
Parents
  • Hi tboelke,

     

    Did you ever get to the bottom of this issue, we started seeing the same issue decided to upgrade to the latest version 4.307-4 yesterday evening but got the same error post the upgrade?

     

    2018:03:01-03:51:40 xxx-yyy-sum-01 selfmonng[3910]: I check Failed increment accd_running counter 1 - 3
    2018:03:01-03:51:45 xxx-yyy-sum-01 selfmonng[3910]: I check Failed increment accd_running counter 2 - 3
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: W check Failed increment accd_running counter 3 - 3
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: [INFO-132] SUM core daemon not running - restarted
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: W NOTIFYEVENT Name=accd_running Level=INFO Id=132 sent
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: W triggerAction: 'cmd'
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: W actionCmd(+):  '/var/mdw/scripts/accd restart'
    2018:03:01-03:51:51 xxx-yyy-sum-01 selfmonng[3910]: W child returned status: exit='0' signal='0'

     

    Regards

    Darren

Reply
  • Hi tboelke,

     

    Did you ever get to the bottom of this issue, we started seeing the same issue decided to upgrade to the latest version 4.307-4 yesterday evening but got the same error post the upgrade?

     

    2018:03:01-03:51:40 xxx-yyy-sum-01 selfmonng[3910]: I check Failed increment accd_running counter 1 - 3
    2018:03:01-03:51:45 xxx-yyy-sum-01 selfmonng[3910]: I check Failed increment accd_running counter 2 - 3
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: W check Failed increment accd_running counter 3 - 3
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: [INFO-132] SUM core daemon not running - restarted
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: W NOTIFYEVENT Name=accd_running Level=INFO Id=132 sent
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: W triggerAction: 'cmd'
    2018:03:01-03:51:50 xxx-yyy-sum-01 selfmonng[3910]: W actionCmd(+):  '/var/mdw/scripts/accd restart'
    2018:03:01-03:51:51 xxx-yyy-sum-01 selfmonng[3910]: W child returned status: exit='0' signal='0'

     

    Regards

    Darren

Children