This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Endpoints show as disconnected in Enterprise Console

Hi,

My setup is as follows:

  • 10 servers on a domain
  • 1 standalone server, not on the domain
  • Enterprise Console installed on the server not in the domain, all servers protected
  • I installed Sophos on all servers by manually browsing to x.x.x.x\SophosUpdate\CIDs\S000\SAVSCFXP and running the setup file.
  • All servers show up and are arranged into their respective policy groups in Enterprise Console and all can update fine.

My issue is that after a couple of days the servers eventually show as if they're disconnected and have a red X through them. They can still update fine but they show as disconnected.  If i re-run the setup file this will keep them connected in the Console for another couple of days, until they eventually drop off again.

I am not sure where to troubleshoot? I was told that installing SEC on a non-member server would not be an issue, so i am not too sure what is causing the endpoints to drop off. Any ideas would be appreciated?

:48214


This thread was automatically locked due to age.
Parents
  • Just as a FYI:

    Client routers logon to the parent router, so it's the client router that initiates the connection.

    If the client router shuts down (for example if the computer shuts down), the then client router will send a logoff message to the parent, this will cause the computer to show as disconnected.  


    Note:It is feasible that if the computer shuts down abruptly and doesn't get a chance to send the logoff message the computer will show as connected when it's not for a short period of time.  Details how below..

    The client router is configured by default to poll the parent router every 15 mins (+-50% randomized) so between 7.5 mins to 22.5mins.

    There is a "GetterInterval" DWORD value under: 

    HKEY_LOCAL_MACHINE\SOFTWARE\[Wow6432Node]\Sophos\Messaging System\Router\
    The default is set to 900 (900 seconds = 15 mins) that controls this.  This polling action is essentially looking for outstanding messages waiting for the client.  This registry value may not exist by default, in which case the defaul value is used by the router.

    This frequent polling mechanism ensures that even if the server router can't notify the client router to come get messages (e.g. a firewall is blocking TCP 8194 on the client) the client will still get the downstream messages (e.g set-config, update now, scan now, etc,,) albeit delayed by the polling interval of the client. 

    Once the client router is connected, the parent router keeps track of the logged on client, in as much as it checks the client has polled (by default) twice in 30 minutes.  If it hasn't checked in then the parent router will log off the client.   Evidence of this happening is recorded in the parent router log.  It will have an informational line detailing it is logging the client off due to a communication timeout.  This mechanism should handle the abrupt shutdown scenario where a logoff message is not sent by the client.

    Note: There is a registry key on the server router that controls this 30 minutes timeout.  This would would only need to be changed if the GetterInterval on the client is changed to allign the disconnect timeout.  E.g. if you changed the getterinterval on the clients to poll every 60 minutes, if the parent router was checking for 2 polls in 30 minutes the client would keep being logged off due to the timeout.

    As a fail-safe for maintaining state outside of RMS.  The Sophos Management Service, which runs a purge task every 24 hours (from startup) will also disconnect all computers with a last message time older than 24 hours.  This is documented here:

    http://www.sophos.com/support/knowledgebase/113293.aspx


    It's worth poininting out that only status messages and entity events count towards changing the last message time for the computer.  Under typical conditions, if a client is checking the update location for updates every 10 minutes and there is an update roughly 7 times a day then there will always be a status message being sent in which will update the last message time.  The only problem with this mechanism is if you have the clients checking for updates very infrequenlty or behind an air-gap where the clients may not get updates within 24 hours.  In this scenario there is no guarantee that the clients aren't sending status messages and they may not nessesarily send events if there is no event to alert to.  In this case, computers may be connected interms of RMS but show as disconnected due to the management service disconnecting them. A workaround to this scenario is in the article.


    I hope this helps with some extra details.

    Regards,

    Jak

    :53055
Reply
  • Just as a FYI:

    Client routers logon to the parent router, so it's the client router that initiates the connection.

    If the client router shuts down (for example if the computer shuts down), the then client router will send a logoff message to the parent, this will cause the computer to show as disconnected.  


    Note:It is feasible that if the computer shuts down abruptly and doesn't get a chance to send the logoff message the computer will show as connected when it's not for a short period of time.  Details how below..

    The client router is configured by default to poll the parent router every 15 mins (+-50% randomized) so between 7.5 mins to 22.5mins.

    There is a "GetterInterval" DWORD value under: 

    HKEY_LOCAL_MACHINE\SOFTWARE\[Wow6432Node]\Sophos\Messaging System\Router\
    The default is set to 900 (900 seconds = 15 mins) that controls this.  This polling action is essentially looking for outstanding messages waiting for the client.  This registry value may not exist by default, in which case the defaul value is used by the router.

    This frequent polling mechanism ensures that even if the server router can't notify the client router to come get messages (e.g. a firewall is blocking TCP 8194 on the client) the client will still get the downstream messages (e.g set-config, update now, scan now, etc,,) albeit delayed by the polling interval of the client. 

    Once the client router is connected, the parent router keeps track of the logged on client, in as much as it checks the client has polled (by default) twice in 30 minutes.  If it hasn't checked in then the parent router will log off the client.   Evidence of this happening is recorded in the parent router log.  It will have an informational line detailing it is logging the client off due to a communication timeout.  This mechanism should handle the abrupt shutdown scenario where a logoff message is not sent by the client.

    Note: There is a registry key on the server router that controls this 30 minutes timeout.  This would would only need to be changed if the GetterInterval on the client is changed to allign the disconnect timeout.  E.g. if you changed the getterinterval on the clients to poll every 60 minutes, if the parent router was checking for 2 polls in 30 minutes the client would keep being logged off due to the timeout.

    As a fail-safe for maintaining state outside of RMS.  The Sophos Management Service, which runs a purge task every 24 hours (from startup) will also disconnect all computers with a last message time older than 24 hours.  This is documented here:

    http://www.sophos.com/support/knowledgebase/113293.aspx


    It's worth poininting out that only status messages and entity events count towards changing the last message time for the computer.  Under typical conditions, if a client is checking the update location for updates every 10 minutes and there is an update roughly 7 times a day then there will always be a status message being sent in which will update the last message time.  The only problem with this mechanism is if you have the clients checking for updates very infrequenlty or behind an air-gap where the clients may not get updates within 24 hours.  In this scenario there is no guarantee that the clients aren't sending status messages and they may not nessesarily send events if there is no event to alert to.  In this case, computers may be connected interms of RMS but show as disconnected due to the management service disconnecting them. A workaround to this scenario is in the article.


    I hope this helps with some extra details.

    Regards,

    Jak

    :53055
Children
No Data