This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

PC's migrated to Win10 not reporting with Sophos server

Hey Sophos Community.  We have a mix of Windows 7 and Windows 10 clients that report into a Windows 2016 server that has the SEC running on it.  The Windows 7 clients are reporting into the SEC server but the Windows 10 clients are not.  Both batches of clients are getting their updates and all are up to date but the Win 10 clients won't report in.

I've done the normal trouble-shooting process and verified that the ports (8192 and 8194) are not being blocked by the Win firewall by checking the netstat -a on the clients and server.  Reviewing the ReportData from a couple Win 10 pc's:

01.12.2019 13:21:15 1564 I SAUAdapter - SAU ReportStatus::FinishedUpdate: Failed to read UpdateSource value in the UpdateStatus registry key.

That error is the one common item in the log files.  The SEC shows:

12/5/2019 1:46:26 PM fffffffd This computer is not yet managed. It is protected but has not yet reported back its status.

From the log file in the RMS\3\Agent\Logs dir of a Win 10 PC that isn't reporting in:

01.12.2019 13:11:15 1564 I SAUAdapter - SAU StartingUpdate has been set
01.12.2019 13:11:15 1564 I SAUAdapter - SAU IPCListener::Wait Waiting for more messages
01.12.2019 13:11:15 1564 I SAUAdapter - SAU IPCListener::Wait received message: <?xml version="1.0" encoding="utf-8" ?><Config type="RMSEndUpdate" />
01.12.2019 13:11:15 1564 I SAUAdapter - SAU FinishedUpdate has been set
01.12.2019 13:11:15 1564 I SAUAdapter - SAU ReportStatus::FinishedUpdate: Failed to read UpdateSource value in the UpdateStatus registry key.
01.12.2019 13:11:15 1564 I SAUAdapter - SAU Update status information saved to C:\ProgramData\Sophos\AutoUpdate\data\status\AUAdapter.xml
01.12.2019 13:11:15 1564 I SAUAdapter - SAU IPCListener::Wait Waiting for more messages

I'm at a loss as to what is the problem(s) and how to trouble-shoot it going further.  Any assistance would be appreciated.

*Edit*

This is the entry I have in the Win FW group policy as an exception for port 8192 and 8194->

8192:TCP:localsubnet:enabled:sophos  



This thread was automatically locked due to age.
Parents
  • Running Wireshark on a test PC and our SEC server.  On the SEC server, I see a lot of port 8194 traffic but port 8192 doesn't seem to be flowing (port 8192 has a lot of re-transmission requests).  Nothing on either port on the test PC.  

    I see that source ports 50630-50634 (on the test PC) not making it through.  Checking into that and will update accordingly.

    *updated*

    I forced an update and the packets are making it to the server and getting them back.  Digging deeper.

  • **UPDATE**

    After monitoring wireshark and tweaking group policies with firewall settings, I went at it a different direction.

    The issue comes down to the RMS client.  Once I uninstalled the RMS client, restarted and then "protected the computer" from the SEC server, the test PC checks-in.

    Here's my question.

    How can I uninstall the RMS client if there is a tamper protection setting on the clients?  I have the msi script done and ready to roll but I don't know how to disable the tamper protection.  Need your help with this portion.  

  • Hello PC_Junkie,

    in a SEC managed installation TP is by default off. Furthermore, it's Enhanced Tamper Protection that prevents "fiddling" with the services. Unless you have customized the CID with a TP policy (which is unlikely) if (E)TP is enabled the endpoint must have received a policy and therefore RMS must have worked. Question is - why did it stop working.

    I used the "Authenticate User" option under the tamper protection and still was unable to stop/start Sophos services
    In addition to authentication you must, using Configure tamper protection, disable TP. It still requires using the GUI though.

    The issue comes down to the RMS client
    As mentioned above. if ETP is on RMS must have worked - and it's not clear why it stopped doing so (and particularly on Windows 10). Coincidentally yesterday I stumbled over an endpoint where the Message Router service was missing (RMS was installed and the Agent was there). Looks like the computer stopped (for whatever reason) just when the RMS install completed. But this is one-off accidence and not some systematic error.

    Once I uninstalled the RMS client, restarted and then "protected the computer"
    I think it's not the uninstall that's crucial (Protect would implicitly have done it) but disabling TP. There's no documented way to disable TP programmatically on SEC managed endpoints.

    Christian

  • QC said:

    Hello PC_Junkie,

    in a SEC managed installation TP is by default off. Furthermore, it's Enhanced Tamper Protection that prevents "fiddling" with the services. Unless you have customized the CID with a TP policy (which is unlikely) if (E)TP is enabled the endpoint must have received a policy and therefore RMS must have worked. Question is - why did it stop working.

    I used the "Authenticate User" option under the tamper protection and still was unable to stop/start Sophos services
    In addition to authentication you must, using Configure tamper protection, disable TP. It still requires using the GUI though.

    The issue comes down to the RMS client
    As mentioned above. if ETP is on RMS must have worked - and it's not clear why it stopped doing so (and particularly on Windows 10). Coincidentally yesterday I stumbled over an endpoint where the Message Router service was missing (RMS was installed and the Agent was there). Looks like the computer stopped (for whatever reason) just when the RMS install completed. But this is one-off accidence and not some systematic error.

    Once I uninstalled the RMS client, restarted and then "protected the computer"
    I think it's not the uninstall that's crucial (Protect would implicitly have done it) but disabling TP. There's no documented way to disable TP programmatically on SEC managed endpoints.

    Christian

     

    Thanks for the information and explanation.  Helps immensely.  

    With Windows 7 quickly running out of time, and being knee deep in this Windows 10 migration, the only thing I could find that is the culprit affecting the communication is the RMS client.  I created a Windows 10 image that is being deployed to all PC's which includes Office 16, Sophos, etc.  The image that is being deployed but the clients are not updating the SEC although the clients are getting the Sophos policies.  When I uninstall/reinstall the RMS client on the test PC, the test PC checks in.

    I'm not sure what else to try to resolve this.  If reinstalling the Sophos RMS client allows the PC to check in, I'm led to believe that the Windows firewall policies are allowing the communication to take place.  Why the RMS client on the image doesn't allow the test PC to check in is beyond me.  

    • Is there anything else that I can test to see if the RMS client can be fixed without an uninstall/reinstall?
    • What would prevent me from accessing the Sophos services?
    • I can't change/delete the registry value for the SEDEnabled either via a script or manually.  Is that because Sophos services are running?

    Again, thank you for your time and help with this.

  • Hello PC_Junkie,

    I'm probably going out on a limb, is Sophos correctly prepared in the image?
    I can't imagine a mistake that could cause the situation you describe (except inadvertently damaging the Message Router or Agent service). That TP is on suggests that Sophos has been fully installed on the image and received the policies (as said, unless you have policies configured in the CID). If you then deploy the image without performing the necessary steps all machines would have the same identity - but then at least one Window 10 computer should appear connected - though its name would constantly change. It's possible though - if you have a significant number of endpoints - to miss this fact.

    Upon re-Installing RMS the endpoint obtains a new (and unique) identity. It's possible to fix incorrectly cloned machines (if this is indeed the cause) but you'd still have the TP problem.

    The Router log in %ProgramData%\Sophos\Remote Management System\3\Router\Logs\ from the test PC that seemingly doesn't check in would perhaps provide some insight.

    Christian 

  • 09.12.2019 14:48:56 17B0 I SOF: C:\ProgramData/Sophos/Remote Management System/3/Router/Logs/Router-20191209-194856.log
    09.12.2019 14:48:56 17B0 I Sophos Messaging Router 4.1.2.24 starting...
    09.12.2019 14:48:56 17B0 I Setting ACE_FD_SETSIZE to 138
    09.12.2019 14:48:56 17B0 I Initializing CORBA...
    09.12.2019 14:48:56 17B0 I Connection cache limit is 10
    09.12.2019 14:48:56 17B0 I Router::ConfigureSslContext: keeping legacy compatibility of TLS 1 and TLS 1.1.
    09.12.2019 14:48:56 17B0 I Creating ORB runner with 4 threads
    09.12.2019 14:48:57 17B0 I Compliant certificate hashing algorithm.
    09.12.2019 14:48:57 17B0 I This computer is part of the domain xxxx
    09.12.2019 14:48:57 17B0 I This router's IOR:
    IOR:010000002600000049444c3a536f70686f734d6573736167696e672f4d657373616765526f757465723a312e300000000100000000000000a4000000010102000e0000003137322e32342e38302e3139350001204100000014010f004e5550000000210000000001000000526f6f74504f4100526f7574657250657273697374656e740003000000010000004d657373616765526f757465720000000300000000000000080000000100af01004f415401000000180000000100af01010001000100000001000105090101000000000014000000080000000100a60086000220
    09.12.2019 14:48:57 17B0 I Successfully validated this router's IOR
    09.12.2019 14:48:57 17B0 I Reading router table file
    09.12.2019 14:48:57 17B0 I Host name: TRN3
    09.12.2019 14:48:57 17B0 I Local IP addresses: x.x.80.195
    09.12.2019 14:48:57 17B0 I Resolved name: TRN3.domain
    09.12.2019 14:48:57 17B0 I Resolved alias/es:
    09.12.2019 14:48:57 17B0 I Resolved IP addresses: x.x.80.195
    09.12.2019 14:48:57 17B0 I Resolved reverse names/aliases: TRN3.domain 
    09.12.2019 14:48:57 17B0 I Waiting for messages...
    09.12.2019 14:48:57 0848 I Getting parent router IOR from x.x.80.23:8192
    09.12.2019 14:48:57 17B0 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 7, max number of user ports 15360
    09.12.2019 14:48:57 0848 I Received parent router's IOR:
    IOR:010000002600000049444c3a536f70686f734d6573736167696e672f4d657373616765526f757465723a312e300000000100000000000000a4000000010102000d0000003137322e32342e38302e3233000001204100000014010f004e5550000000210000000001000000526f6f74504f4100526f7574657250657273697374656e740003000000010000004d657373616765526f7574657200000003000000000000000800000001baf901004f4154010000001800000001baf9010100010001000000010001050901010000000000140000000800000001baa60086000220
    09.12.2019 14:48:57 0848 I Successfully validated parent router's IOR
    09.12.2019 14:48:57 0848 I Accessing parent
    09.12.2019 14:48:57 0848 I SSL handshake done, local IP address = x.x.80.195
    09.12.2019 14:48:57 0848 I Parent is Router$W2K16
    09.12.2019 14:48:57 0848 I RouterTableEntry::LogonToParentRouter() - logging on as active consumer
    09.12.2019 14:48:57 0848 I RouterTableEntry state (router, logging on): Router$W2K16 is passive consumer, passive supplier
    09.12.2019 14:48:57 0848 I Logged on to parent router as Router$WIN10V2:18003
    09.12.2019 14:48:57 0848 I This computer is part of the domain xxxx
    09.12.2019 14:48:57 0EA0 I SSL handshake done, local IP address = x.x.80.195
    09.12.2019 14:48:58 00B0 I Client::LogonPushPush() successfully called back to client
    09.12.2019 14:48:58 00B0 I Logged on Agent as a client
    09.12.2019 14:48:58 01F4 I Routing to Agent: id=03EEA52A, origin=Router$WIN10V2:18003, dest=Router$WIN10V2:18003.Agent, type=EM-ClientLogon
    09.12.2019 14:48:58 01F8 I Sent message (id=03EEA52A) to Agent
    09.12.2019 14:48:59 01F4 I Received message for this router
    09.12.2019 14:48:59 01F4 I EM-NotifyClientUpdates originator Router$WIN10V2:18003.Agent
    09.12.2019 14:48:59 01F4 I Routing to Agent: id=01EEA52B, origin=Router$WIN10V2:18003, dest=Router$WIN10V2:18003.Agent, type=EM-NotifyClientUpdates-Reply
    09.12.2019 14:48:59 08C4 I Sent message (id=01EEA52B) to Agent
    09.12.2019 14:49:19 01F4 I Routing to parent: id=01EEA53F, origin=Router$WIN10V2:18003.Agent, dest=EM, type=EM-GetStatus-Reply
    09.12.2019 14:49:19 14FC E Failed to send message (id=01EEA53F) because of unknown exception, adding message back to queue
    09.12.2019 14:49:19 14FC E Failed to send messages, logging Router$W2K16 off
    09.12.2019 14:49:19 14FC E SenderWorker: Caught CORBA system exception, ID 'IDL:omg.org/CORBA/OBJECT_NOT_EXIST:1.0'
    OMG minor code (2), described as '*unknown description*', completed = NO

    09.12.2019 14:49:49 0848 I RouterTableEntry::LogonToParentRouter() - logging on as active consumer
    09.12.2019 14:49:49 0848 I RouterTableEntry state (router, logging on): Router$W2K1 is passive consumer, passive supplier
    09.12.2019 14:49:49 0848 I Logged on to parent router as Router$WIN10V2:18003
    09.12.2019 14:49:49 0848 I This computer is part of the domain xxxx
    09.12.2019 14:49:49 01F8 I Sent message (id=01EEA53F) to Router$W2K16
    09.12.2019 14:51:51 01F4 I Routing to parent: id=01EEA5D7, origin=Router$WIN10V2:18003.Agent, dest=EM, type=EM-EntityEvent
    09.12.2019 14:51:51 08C4 E Failed to send message (id=01EEA5D7) because of unknown exception, adding message back to queue
    09.12.2019 14:51:51 08C4 E Failed to send messages, logging Router$W2K16 off
    09.12.2019 14:51:51 08C4 E SenderWorker: Caught CORBA system exception, ID 'IDL:omg.org/CORBA/OBJECT_NOT_EXIST:1.0'
    OMG minor code (2), described as '*unknown description*', completed = NO

    W2K16 - the SEC on site

    WIN10V2 - I can't locate this PC, it isn't pingable and does not have an account in the AD environment.  The PC the image was made on did not have that name.

    When I reviewed this log file from a few computers early on in this endeavor, I read a question/answer on Sophos where modifying the router value in an .xml file was not a suggested option.  Is that a correct conclusion?

    You are correct about the imaging process.  I did not follow that process outlined in that document.  I built the Windows 10 image exactly how I did the Windows 7 image that was used when we migrated from XP.  Installed the OS, Sophos AV and proceeded on.  When it came time for the image, ran sysprep and pushed forward.  Before proceeding further in the deployment, I will modify the image to comply with the instructions outlined in the imaging document you provided.

    In the mean time, what is the best procedure to fix the 50-75 PC's already deployed?

  • My apologies.  I should have read your reply more closely as I missed this portion the first time I read through it: https://community.sophos.com/kb/en-us/116635 

    I'm checking into this option now however I can't stop/start the Sophos services even logged in as the local administrator.

  • Hello PC_Junkie,

    Please note: you have sanitized the IPs but be aware that the IORs in the Router reveal the full address (no, I haven't decoded them).

    If the log is from the "first run" after deployment then the endpoint has clearly already an identity (and likely the same as the others). If you see some "unknown" name there's an active PC with this name somewhere.

    From experience: the method given in article 116635 is safe, I've used had to (don't ask) use it more than once.

    Christian

  • Finally got the services to stop but as you alluded to, it required manually disabling the TP by authenticating via the Sophos client.  That wouldn't be something I could do through a script.  Regardless, I'm going to continue with the testing of that KB article (116635) and see if that resolves the check-in issue.

    QC said:

    Please note: you have sanitized the IPs but be aware that the IORs in the Router reveal the full address (no, I haven't decoded them).

    I highly doubt there is much that you can't do provided the log files but I'm hopeful others will have to gauge whether the time they will have to spend to get the info is worth what it will provide to them.  I appreciate the disclaimer none-the-less

  • Even a script from this site https://www.techsupportpk.com/2016/10/how-to-uninstall-tamper-protected-sophos-antivirus-with-powershell.html shows a prompt requiring user interaction.

    Guess I'm slipping into the sad realization that I will need to remote in and manually authenticate the TP before making the changes outlined in the KB article.  After authenticating, I suppose the team can run a script and then hop onto the next one.  

  • QC said:

    From experience: the method given in article 116635 is safe, I've used had to (don't ask) use it more than once.

    Christian

     

     
    So I've found the sequence of steps to get the clients checking-in which are time consuming but evidently necessary.  I'm following the KB 116635 and restarting the PC.
     
    The PC's are checking-in but under Policy compliance in the SEC, most are showing "awaiting policy transfer".  Before I dive into the deep end and spend so much time with this "fix", how long should it take (on average) for the PC to show "same as policy"?  
     
  • Hello PC_Junkie,

    a restart of the endpoint isn't necessary, starting the services suffices.

    awaiting policy transfer
    is a little bit tricky and somewhat obscure. I'll try to explain the different statuses (there might be more) you can encounter, how they originate and what they mean.

    • Awaiting policy from console - is the status after initial install. When the endpoint's Agent detects that its AdapterStorage (%ProgramData%\Sophos\Remote Management System\3\Agent\AdapterStorage\) is (totally or partially) empty [1] it informs the Management Server (although not correct I'll subsequently refer to it as SEC) that it needs the corresponding policy or policies. In response SEC sends/enqueues the requested policies [2][3]
    • Same as policy - as aggregated value (for the priority see [4] below) in the Status tab and for an individual policy in the associated tab. Compliance is calculated (comparing the policy values to the components' settings) by the agent on the endpoint
    • Differs from policy - as above. Differs can be caused by local changes to the settings, or an error in a (sub-)component (e.g. the Device Control service not running)
    • Comparison failure - rare, usually caused by some Infrastructure (i.e. internal communication) error
    • Awaiting policy transfer - in addition to the compliance status the Agent sends a policy's ID (a GUID value). An endpoint's group membership and the applicable policies determine target IDs. When the target IDs change [5] the policy status changes to Awaiting, SEC enqueues/sends the policies in question [5]
    • Locally configured - an arcane setting for an endpoint in SEC's database (not settable via the Console GUI) that tells SEC to discontinue policy management for this endpoint. Other status updates and management work as usual
    1. Deleting items from the AdapterStorage initiates the same workflow
    2. A policy (or a command) is sent immediately if the downstream connection to the endpoint's port 8194 exists. Otherwise it's enqueued and sent in response to a message from the endpoint. Enqueued messages have a TTL and are discarded when it expires (e.g. because the endpoint is turned off). This has no effect on the policy status and therefore the Awaiting could persist "forever"
    3. I haven't lately tested the behaviour when the downstream connection is not available. Enqueued messgae are definitely sent in response to a status message (when the Agent starts, when - every few hours - an update is applied, settings are changed locally, scan completed), or an Alert or Event. Endpoints used to poll regularly (every 15 minutes if the GetterInterval in the registry is what controls it) for messages. It still seems to be the case, though sometimes it takes significantly longer (but then there might be minor issues on the endpoints).
    4. The aggregated status reflects the individual status with the highest priority in the order Locally configured, Comparison failure, Differs from policy, Awaiting policy from console, Awaiting policy transfer, Same as policy
    5. A target ID can change because
      a policy is modified
      a different policy is assigned to a group
      an endpoint is moved to a group with a different policy

    most are showing "awaiting policy transfer"
    so you moved them to a group. As said above, if the downstream connection exists the policies should immediately be applied. Otherwise it normally takes up to 15 minutes. Might mention that a) the Connected (green icon) status is not always correct (it can persist even when there is no longer a connection) and b) in case of a reappearing endpoint (one that has been deleted from the Console but later pops up again in the same group) the policies might not get sent. Just tested with 50 Connected endpoints in a group, all but 5 applied the policy within 15 minutes (and the expected average of about 7:30), for 4 it took slightly longer, 1 was apparently incorrectly shown as Connected.
    I'd request Comply with ... from the console to make sure that the policies have been sent.

    Christian

Reply
  • Hello PC_Junkie,

    a restart of the endpoint isn't necessary, starting the services suffices.

    awaiting policy transfer
    is a little bit tricky and somewhat obscure. I'll try to explain the different statuses (there might be more) you can encounter, how they originate and what they mean.

    • Awaiting policy from console - is the status after initial install. When the endpoint's Agent detects that its AdapterStorage (%ProgramData%\Sophos\Remote Management System\3\Agent\AdapterStorage\) is (totally or partially) empty [1] it informs the Management Server (although not correct I'll subsequently refer to it as SEC) that it needs the corresponding policy or policies. In response SEC sends/enqueues the requested policies [2][3]
    • Same as policy - as aggregated value (for the priority see [4] below) in the Status tab and for an individual policy in the associated tab. Compliance is calculated (comparing the policy values to the components' settings) by the agent on the endpoint
    • Differs from policy - as above. Differs can be caused by local changes to the settings, or an error in a (sub-)component (e.g. the Device Control service not running)
    • Comparison failure - rare, usually caused by some Infrastructure (i.e. internal communication) error
    • Awaiting policy transfer - in addition to the compliance status the Agent sends a policy's ID (a GUID value). An endpoint's group membership and the applicable policies determine target IDs. When the target IDs change [5] the policy status changes to Awaiting, SEC enqueues/sends the policies in question [5]
    • Locally configured - an arcane setting for an endpoint in SEC's database (not settable via the Console GUI) that tells SEC to discontinue policy management for this endpoint. Other status updates and management work as usual
    1. Deleting items from the AdapterStorage initiates the same workflow
    2. A policy (or a command) is sent immediately if the downstream connection to the endpoint's port 8194 exists. Otherwise it's enqueued and sent in response to a message from the endpoint. Enqueued messages have a TTL and are discarded when it expires (e.g. because the endpoint is turned off). This has no effect on the policy status and therefore the Awaiting could persist "forever"
    3. I haven't lately tested the behaviour when the downstream connection is not available. Enqueued messgae are definitely sent in response to a status message (when the Agent starts, when - every few hours - an update is applied, settings are changed locally, scan completed), or an Alert or Event. Endpoints used to poll regularly (every 15 minutes if the GetterInterval in the registry is what controls it) for messages. It still seems to be the case, though sometimes it takes significantly longer (but then there might be minor issues on the endpoints).
    4. The aggregated status reflects the individual status with the highest priority in the order Locally configured, Comparison failure, Differs from policy, Awaiting policy from console, Awaiting policy transfer, Same as policy
    5. A target ID can change because
      a policy is modified
      a different policy is assigned to a group
      an endpoint is moved to a group with a different policy

    most are showing "awaiting policy transfer"
    so you moved them to a group. As said above, if the downstream connection exists the policies should immediately be applied. Otherwise it normally takes up to 15 minutes. Might mention that a) the Connected (green icon) status is not always correct (it can persist even when there is no longer a connection) and b) in case of a reappearing endpoint (one that has been deleted from the Console but later pops up again in the same group) the policies might not get sent. Just tested with 50 Connected endpoints in a group, all but 5 applied the policy within 15 minutes (and the expected average of about 7:30), for 4 it took slightly longer, 1 was apparently incorrectly shown as Connected.
    I'd request Comply with ... from the console to make sure that the policies have been sent.

    Christian

Children
No Data