This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Endpoints show as disconnected in Enterprise Console

Hi,

My setup is as follows:

  • 10 servers on a domain
  • 1 standalone server, not on the domain
  • Enterprise Console installed on the server not in the domain, all servers protected
  • I installed Sophos on all servers by manually browsing to x.x.x.x\SophosUpdate\CIDs\S000\SAVSCFXP and running the setup file.
  • All servers show up and are arranged into their respective policy groups in Enterprise Console and all can update fine.

My issue is that after a couple of days the servers eventually show as if they're disconnected and have a red X through them. They can still update fine but they show as disconnected.  If i re-run the setup file this will keep them connected in the Console for another couple of days, until they eventually drop off again.

I am not sure where to troubleshoot? I was told that installing SEC on a non-member server would not be an issue, so i am not too sure what is causing the endpoints to drop off. Any ideas would be appreciated?

:48214


This thread was automatically locked due to age.
  • Hi,

    That setup should work fine, the communication only relies on a TCP/IP connection between the endpoints and the server.

    Does this apply:

    http://www.sophos.com/en-us/support/knowledgebase/113293.aspx

    Essentially are the computers sending in either a status or entityevent message within 24 hours to prevent the management service setting them as offline due to a stale last message time..  The last message time column in SEC should reveal if that's the case.

    Otherwise, ensure that the clients can access port 8192 TCP and 8194 TCP of the management server and the server can access TCP 8194 of the clients.

    Regards,

    Jak

    :48218
  • Thanks for the response. It does look like an issue in sending the logoff message. I have found this from 3 separate endpoints:

    ENDPOINT 1:

    05.03.2014 14:53:27 09CC E GetterWorker: Caught CORBA system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'
    OMG minor code (2), described as '*unknown description*', completed = NO
    05.03.2014 14:53:27 09CC E Failed to get messages, logging Router$XXXX off
    05.03.2014 14:53:53 09AC E Failed to send message (id=01169EA1) because of unknown exception, adding message back to queue
    05.03.2014 14:53:53 09AC E Failed to send messages, logging Router$XXXX off
    05.03.2014 14:53:53 09AC E SenderWorker: Caught CORBA user exception, ID 'IDL:SophosMessaging/NotLoggedOn:1.0' during logoff
    05.03.2014 14:53:53 09AC E SenderWorker: Caught CORBA system exception, ID 'IDL:omg.org/CORBA/OBJECT_NOT_EXIST:1.0'
    Unknown vendor minor code id (0), minor code = 0, completed = NO

    ENDPOINT2:
    05.03.2014 14:51:10 0C28 E Failed to send message (id=01169CAD) because of unknown exception, adding message back to queue
    05.03.2014 14:51:10 0C28 E Failed to send messages, logging Router$XXXX off
    05.03.2014 14:51:10 0C28 E SenderWorker: Caught CORBA system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'
    OMG minor code (2), described as '*unknown description*', completed = NO

    ENDPOINT 3:
    05.03.2014 14:43:00 0F84 I Routing to parent: id=01169D44, origin=Router$XXXXX:9012.Agent, dest=EM, type=EM-GetStatus-Reply
    05.03.2014 14:53:22 0564 E Failed to send message (id=01169D31) because of unknown exception, adding message back to queue
    05.03.2014 14:53:22 0564 E Failed to send messages, logging Router$XXXX off
    05.03.2014 14:53:22 0564 E SenderWorker: Caught CORBA system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'
    OMG minor code (2), described as '*unknown description*', completed = NO

    Have you seen this error before?

    Given that they all occured around the same time I am assuming that would  indicate a network issue?

    :48234
  • Hi,

    It looks like either there was a network problem or an issue with these clients parent Sophos Message Router.  

    If you restart the parent router service which I assume is on the management server, unless you use message relays: Do these clients automatically reconnect and start checking for messages again?

    Regards,

    Jak

    :48238
  • I just restarted that service and yes they do all come back online.

    I wonder what could be causing the dropoff.  I had a look at the parent router logs from that time and all i see is:

    05.03.2014 14:50:32 08E4 I This computer is part of the workgroup WORKGROUP
    05.03.2014 14:50:32 08E4 E ACE_DLL::open failed for TAO_ImR_Client: Error: check log for details.
    05.03.2014 14:50:32 08E4 E Unable to find service: ImR_Client_Adapter

    But then it seems to come back online and send & receive messages again.  I will keep an eye on it and see if it happens again. Maybe i will need to reinstall SEC?

    :48288
  • Did you ever get this resolved? I have exactly the same situation, same logs everthing.

    We are using SEC 5.2.1 with the latest endpoint version. It has been doing this since it was first installed, if I restart the Sophos Message Router service on the management server all the end points come back, approx two days later they all diconnect again.

    I have eliminated firewalls and am still getting the problem, the end points are not reporting any errors and the logs show they are transmitting "keep alive" messages of one sort or the other at least once a day, the Management server appears to be just ignoring them!

    Really bugging me now, any ideas how to proceed?

    :53013
  • Ours turned out packet loss between 2 of our switches. Fixed up the networking issues and the clients now stay connected.

    :53053
  • Just as a FYI:

    Client routers logon to the parent router, so it's the client router that initiates the connection.

    If the client router shuts down (for example if the computer shuts down), the then client router will send a logoff message to the parent, this will cause the computer to show as disconnected.  


    Note:It is feasible that if the computer shuts down abruptly and doesn't get a chance to send the logoff message the computer will show as connected when it's not for a short period of time.  Details how below..

    The client router is configured by default to poll the parent router every 15 mins (+-50% randomized) so between 7.5 mins to 22.5mins.

    There is a "GetterInterval" DWORD value under: 

    HKEY_LOCAL_MACHINE\SOFTWARE\[Wow6432Node]\Sophos\Messaging System\Router\
    The default is set to 900 (900 seconds = 15 mins) that controls this.  This polling action is essentially looking for outstanding messages waiting for the client.  This registry value may not exist by default, in which case the defaul value is used by the router.

    This frequent polling mechanism ensures that even if the server router can't notify the client router to come get messages (e.g. a firewall is blocking TCP 8194 on the client) the client will still get the downstream messages (e.g set-config, update now, scan now, etc,,) albeit delayed by the polling interval of the client. 

    Once the client router is connected, the parent router keeps track of the logged on client, in as much as it checks the client has polled (by default) twice in 30 minutes.  If it hasn't checked in then the parent router will log off the client.   Evidence of this happening is recorded in the parent router log.  It will have an informational line detailing it is logging the client off due to a communication timeout.  This mechanism should handle the abrupt shutdown scenario where a logoff message is not sent by the client.

    Note: There is a registry key on the server router that controls this 30 minutes timeout.  This would would only need to be changed if the GetterInterval on the client is changed to allign the disconnect timeout.  E.g. if you changed the getterinterval on the clients to poll every 60 minutes, if the parent router was checking for 2 polls in 30 minutes the client would keep being logged off due to the timeout.

    As a fail-safe for maintaining state outside of RMS.  The Sophos Management Service, which runs a purge task every 24 hours (from startup) will also disconnect all computers with a last message time older than 24 hours.  This is documented here:

    http://www.sophos.com/support/knowledgebase/113293.aspx


    It's worth poininting out that only status messages and entity events count towards changing the last message time for the computer.  Under typical conditions, if a client is checking the update location for updates every 10 minutes and there is an update roughly 7 times a day then there will always be a status message being sent in which will update the last message time.  The only problem with this mechanism is if you have the clients checking for updates very infrequenlty or behind an air-gap where the clients may not get updates within 24 hours.  In this scenario there is no guarantee that the clients aren't sending status messages and they may not nessesarily send events if there is no event to alert to.  In this case, computers may be connected interms of RMS but show as disconnected due to the management service disconnecting them. A workaround to this scenario is in the article.


    I hope this helps with some extra details.

    Regards,

    Jak

    :53055