This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

The Sophos Agent service terminated unexpectedly in event logs on citrix servers

On our citrix servers we are monitoring our Sophos Agent service with kaseya.  We are noticing that the service is constantly restarting.  I have uninstall and reinstalled sophos enterprise on the servers and we are still having issues.

 



This thread was automatically locked due to age.
  • I would start with the Sophos Agent log files to see if there is anything obvious when the process terminates.

    Sophos Remote Management System

    Agent-yyyymmdd-hhmmss.log
    Location Windows 2000/XP/2003: C:\Documents and Settings\All Users\Application Data\Sophos\Remote Management System\3\Agent\Logs
    Windows Vista and above: C:\ProgramData\Sophos\Remote Management System\3\Agent\Logs
    Description Remote Management System agent log
    Maximum of 8 logs in rotation. Rotation occurs on each start of the Sophos Agent service.


    If not:
    I would suggest it then might be worth setting LogLevel 2 for the agent service as per: https://community.sophos.com/kb/30496.

    I would also install Microsoft Procdump as the default post mortem debugger in order to get a full process dump of the agent when it does crash.

    Procdump.exe - https://technet.microsoft.com/en-gb/sysinternals/dd996900.aspx.

    In a admin command prompt with procdump.exe in the path, run:

    mkdir C:\dumps
    procdump -ma -i c:\dumps

    Wait/reproduce the issue and you should get a full dump of the agent process in the dumps directory. 

    Once done, you can uninstall Procdump as the default debugger running:

    procdump -u

    It might be worth getting a few dumps to prove it's the same issue each time.

    It would be worth capturing the trace logs and the dumps that cover the same time period, so it's possible to see from the logs what the threads in the dumps are, and what they are doing.

    Regards,

    Jak

     

  • in the ones that are restarting i am finding this 

    12.05.2017 07:38:46 5D30 I Refreshing certificate. Agent will be restarted.

     

    12.05.2017 07:28:54 5A68 I SendStatus: Sent EM-GetStatus-Reply (id=0115AA86) to EM
    12.05.2017 07:33:34 3EC0 E CORBA::Exception: Caught CORBA system exception, ID 'IDL:omg.org/CORBA/OBJECT_NOT_EXIST:1.0'
    OMG minor code (2), described as '*unknown description*', completed = NO
    ClientConnection::ProcessHeartbeat()

    12.05.2017 07:33:34 3E80 I Lost connection to router...
    12.05.2017 07:33:34 3EC0 I Initializing ...
    12.05.2017 07:33:34 3EC0 I Running certificate verification...
    12.05.2017 07:33:34 3EC0 I Non-compliant certificate hashing algorithm.
    12.05.2017 07:33:34 3EC0 I Still waiting for certificate delivery. Last request time was 77067308.
    12.05.2017 07:33:34 3E80 I Connected to router...
    12.05.2017 07:33:54 2620 I computer name is VEN-CTX-08
    12.05.2017 07:33:54 2620 I This computer is part of the domain ******
    12.05.2017 07:33:54 2620 I workgroup/domain name is ******
    12.05.2017 07:33:54 2620 I computer description is
    12.05.2017 07:33:54 2620 I This computer is part of the domain ******
    12.05.2017 07:33:54 2620 I SendStatus: Sent EM-GetStatus-Reply (id=0115ABB2) to EM
    12.05.2017 07:37:46 5EFC I SAUAdapter - SAU IPCListener::Wait received message: <?xml version="1.0" encoding="utf-8" ?><Config type="RMSStartUpdate" />
    12.05.2017 07:37:46 5EFC I SAUAdapter - SAU StartingUpdate has been set
    12.05.2017 07:37:46 5EFC I SAUAdapter - SAU IPCListener::Wait Waiting for more messages
    12.05.2017 07:38:46 5D30 I Refreshing certificate. Agent will be restarted.
    12.05.2017 07:38:46 5D30 I Shutting down...
    12.05.2017 07:38:46 5D30 I Stopping AdapterManager ...
    12.05.2017 07:38:49 5F34 I Terminating the AdapterMonitor thread ...
    12.05.2017 07:38:49 5D30 I Unloading adapter ALC ...
    12.05.2017 07:38:49 5D30 I SAUAdapter - SAU DeRegisterStateObserver : 00E52320
    12.05.2017 07:38:49 5D30 I SAUAdapter - SAU DeRegisterConfigStateObserver : 00E52324
    12.05.2017 07:38:49 5D30 I SAUAdapter - SAU DeRegisterEventObserver : 00E52348
    12.05.2017 07:38:49 5D30 I SAUAdapter - SAU Adapter is being deleted: 004CA688
    12.05.2017 07:38:49 5D30 I SAUAdapter - SAU ~AdapterImpl
    12.05.2017 07:38:49 5D30 I SAUAdapter - SAU Update status information saved to C:\ProgramData\Sophos\AutoUpdate\data\status\AUAdapter.xml
    12.05.2017 07:38:49 5EFC I SAUAdapter - SAU IPCListener::Wait exiting
    12.05.2017 07:38:49 5D30 I Unloading adapter SAV ...
    12.05.2017 07:38:49 5D30 I Unloading adapter SED ...
    12.05.2017 07:38:49 5D30 I SED adapter unregistering state observer
    12.05.2017 07:38:49 5D30 I Unloading adapter SWC ...
    12.05.2017 07:38:49 5D30 I Unregistering state observer
    12.05.2017 07:38:49 5D30 I Unregistering event observer

  • i had the procdump installed before and it never got anything.  Might be because its not crashing but restarting per the logs above

  • Yes, the behaviour appears to be as designed given the restart message.

    I would suggest to:

    1. Make sure that on the management server the Sophos Certification Manager service is started.  Maybe restart it, this will log back onto the local router.  

    2. On the problematic client I would delete both the router and agent's pkc and pkp values in the registry under:

    Router:
    HKEY_LOCAL_MACHINE\SOFTWARE\[Wow6432Node]\Sophos\Messaging System\Router\Private
    HKEY_LOCAL_MACHINE\SOFTWARE\[Wow6432Node]\Sophos\Messaging System\Router\Private

    Agent:
    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Sophos\Remote Management System\ManagementAgent\Private
    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Sophos\Remote Management System\ManagementAgent\Private

    Then restart the Sophos Message Router and Sophos Agent service.  If the certification process is working, in a few seconds you should get back the router pkc and pkp values under the router key.  At this point the Router has its certificate.  The local Sophos Agent service can then log on to the local router and request a certificate.  As long as port 8194 TCP is open on the client, the agent should get a pkc and pkp value within a few seconds also.  If it's not open to the management server connecting on this port it maybe take a little longer. Say 2 minutes as this is the short polling interval for certification before it shifts to the 15 minutes +-50% once managed.

    At this point, with refreshed certificates do you see the problem?

    Regards,
    Jak

  • still having the problem

  • anything else to try?  My one citrix server which is not being used by citrix users now (has about 2 users a day on it compared with 24 users) seems to be working fine and does not have this agent restarting issue.  Sophos event logs are good for this server.