We've been getting very frequent emails that SUM's core daemon has been restarted. As an attempt to resolve this situation, we created a new VMware instance of SUM 4.106-2, installed from scratch. The system didn't encounter any issues until recently, when it started to go through this loop of restarting the core daemon.
An excerpt from accd.log of a more recent restart:
2014:04:29-14:34:42 sum4vm accd: 1943907 [0xdf5e3b70] WARN server.device.DeviceSession null - DeviceSession::clear() IO error during recv [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:42 sum4vm accd: 1943908 [0xddde0b70] ERROR libs.io.Session null - send attempted after previous error [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:42 sum4vm accd: 1943908 [0xddde0b70] WARN server.device.DeviceSession null - DeviceSession::clear() IO error during sendDone [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:52 sum4vm accd: 1954132 [0xf2f54b70] WARN server.device.DeviceCache null - DeviceCache::login() device is already connected 92[device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:52 sum4vm accd: 1954132 [0xdf5e3b70] WARN server.device.DeviceSession null - DeviceSession::clear() IO error during recv [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:52 sum4vm accd: 1954132 [0xe95f7b70] ERROR libs.io.Session null - send attempted after previous error [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:34:52 sum4vm accd: 1954132 [0xe95f7b70] WARN server.device.DeviceSession null - DeviceSession::clear() IO error during sendDone [device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:35:02 sum4vm accd: 1964211 [0xe4deeb70] WARN server.device.DeviceCache null - DeviceCache::login() device is already connected 92[device;guid:3b5a8847-81e7-3217-883a-b818c07de1e1;ip:removed]
2014:04:29-14:35:20 sum4vm accd: 13 [0xf571a720] INFO server.accd null - Starting accd
2014:04:29-14:35:20 sum4vm accd: 16 [0xf571a720] INFO server.accd null - Starting services
2014:04:29-14:35:20 sum4vm accd: 63 [0xf571a720] INFO libs.store.DataValidator null - Created 10 validators in 1 items
The removed IP is the same in all entries.
selfmon.log:
2014:04:29-14:02:07 sum4vm selfmonng[4498]: I check Failed increment accd_running counter 1 - 3
2014:04:29-14:02:12 sum4vm selfmonng[4498]: I check Failed increment accd_running counter 2 - 3
2014:04:29-14:02:17 sum4vm selfmonng[4498]: W check Failed increment accd_running counter 3 - 3
2014:04:29-14:02:17 sum4vm selfmonng[4498]: SUM core daemon not running - restarted
2014:04:29-14:02:17 sum4vm selfmonng[4498]: W NOTIFYEVENT Name=accd_running Level=INFO Id=132 sent
2014:04:29-14:02:17 sum4vm selfmonng[4498]: W triggerAction: 'cmd'
2014:04:29-14:02:17 sum4vm selfmonng[4498]: W actionCmd(+): '/var/mdw/scripts/accd restart'
2014:04:29-14:02:18 sum4vm selfmonng[4498]: W child returned status: exit='0' signal='0'
I haven't found much regarding this anywhere else on the forums, but whenever it crashes, all users are logged out of SUM and forced to log back in.
This thread was automatically locked due to age.