This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

mcs-push-server redirection - failing LiveQueryScheduled json upload - MCS client service running on high CPU load

On some Servers behind Sophos UTM firewall, which is not capable of wildcard DNS hosts, we noticed increasing CPU load over the last days. Up to 100% today and  the server became sluggish.

The CPU load was rising since March 28th - where we rebooted the server as requested by Sophos Endpoint pending reboot after component update.

Core Agent        2024.1.0.51
Sophos Intercept X        2023.2.1.6
Managed Detection and Response        2023.2.0.3
XDR        2024.1.0.51

Rising CPU load:

CPU load caused by MCS client:

We could find from the logs that the MCS client was trying to push json files to datalake. An ammount of over 100k files in

C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\

Filenames like scheduled-20240322091806838.json

The files could not be pushed to the push servers and were aging out. The huge ammount of files that MCS was trying to upload  caused the increasing CPU load.

2024-04-05T09:57:22.286Z [ 3032: 4840] W Feed channel scheduled_query: too old file C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\scheduled-20240322095704005.json
2024-04-05T09:57:52.496Z [ 3032: 4840] W Feed channel scheduled_query: too old file C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\scheduled-20240322095735944.json

We can see from the MCS logs it is trying to reach

mcs-push-server-eu-central-1.prod.hydra.sophos.com

but is then redirected to su-7f291a12c603 - one of the hundreds of servers used for loadbalancing.

2024-04-05T09:35:12.403Z [ 4160: 7960] I [push]: [connect] using server mcs-push-server-eu-central-1.prod.hydra.sophos.com/ps without a proxy (peer address 18.184.143.209)
2024-04-05T09:35:12.406Z [ 4160: 7960] I (async) GET mcs-push-server-eu-central-1.prod.hydra.sophos.com:443/.../xxxxxxxxx-c179-4ec3-bcea-7b7dd619fc5e
2024-04-05T09:35:12.429Z [ 4160: 6880] I (async) Redirected to su-7f291a12c603.mcs-push-server-eu-central-1.prod.hydra.sophos.com/.../xxxxxxxxx-9b47-4f95-a250-bfef4036f015
2024-04-05T09:35:33.448Z [ 4160: 7960] W (async) connection failed
2024-04-05T09:35:33.448Z [ 4160: 7960] I [push]: Dropping connection after error
2024-04-05T09:35:33.450Z [ 4160: 7960] I Poll loop: Failed
2024-04-05T09:35:33.452Z [ 4160: 7960] I Connection reset

On the firewall we allow 443 to DNS group mcs-push-server-eu-central-1.prod.hydra.sophos.com - that is the only thing we can do at an UTM.

When the Endpoint then decides to directly contact the sophos Servers with their su-******* name, it fails, of course.

Is there anything we could do except allowing 443 for the Sophos Endpoints to any at the UTM?

I can see on the agent, there is some proxy setting available, how can we use and configure that?



This thread was automatically locked due to age.
Parents
  • Sophos Endpoint is doing DoS here. CPU load is rising again after the last Agent re-install (that cleaned the Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming folder)

    Is there a solution except allowing HTTPS to any for Sophos endpoint behind Sophos UTM?

  • Hi LHerzog,

    I suggest opening a support case related to your issue with our team. 

    I was able to locate a few support cases also related to this behavior. Currently I am not aware of a simple solution or workaround to this beyond what you have suggested in allowing HTTPS.

    Opening a support case will allow our team to better gauge the impact this has on our customer base so that more permanent solutions can be implemented. 

    An alternative would be to upgrade the firewall, however, I understand that is not so easily done. 

    Kushal Lakhan
    Team Lead, Global Community Support
    Connect with Sophos Support, get alerted, and be informed.
    If a post solves your question, please use the "Verify Answer" button.
    The New Home of Sophos Support Videos!  Visit Sophos Techvids
  • OK, I will concider opening a support case. Eventually we can play with DNS host file manipulations?

  • played with proxy settings now seems to work. using a XG firewall in adifferent net segment, capable of wildcard domains, now for the test client while it's gateway firewall is the UTM.

    https://support.sophos.com/support/s/article/KB-000034818?language=en_US#ProxyEndpoint

    The test machine is now working for a while on the thousands of files in the folder but now they are getting less not more. Slowly.

  • MCS Client processed ~90k files now and the folder is empty since 13:45.

    log looking like:

    2024-04-12T11:44:42.893Z [ 6316: 8300] W Feed channel scheduled_query: discarded file (purge requested): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Network-0000000001c1043b-0000000001c109d7-133573958296279662-133573958431930970.json
    2024-04-12T11:44:42.893Z [ 6316: 8300] W Feed channel scheduled_query: discarded file (purge requested): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Network-0000000001c109f8-0000000001c10c9b-133573958433843996-133573958580714307.json
    2024-04-12T11:44:42.894Z [ 6316: 8300] W Feed channel scheduled_query: discarded file (purge requested): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\WinSec-0000000001c10bcf-0000000001c10c05-133573957950000000-133573958510000000.json
    2024-04-12T11:44:42.894Z [ 6316: 8300] W Feed channel scheduled_query: discarded file (purge requested): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Network-0000000001c10cf3-0000000001c10f77-133573958588777154-133573958731350733.json
    2024-04-12T11:44:45.477Z [ 6316: 3120] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Dns-0000000000109e68-0000000000109e69-133568039053775600-133568039053775698.json
    2024-04-12T11:44:45.477Z [ 6316: 3120] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Dns-000000000010a179-000000000010a193-133568039159709257-133568039161489628.json
    2024-04-12T11:44:45.478Z [ 6316: 3120] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Dns-000000000010a56f-000000000010a662-133568039347835380-133568039363630003.json
    ...
    thousands of similar lines
    ...
    2024-04-12T12:20:08.951Z [ 6316: 3120] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Process-0000000001c33213-0000000001c3335c-133573979896875241-133573979919688896.json
    2024-04-12T12:20:19.815Z [ 6316: 8300] I GET https://mcs2-cloudstation-eu-central-1.prod.hydra.sophos.com:443/sophos/management/ep/commands/applications/AGENT;ALC;APPSPROXY;CORC;CORE;EFW;FIM;HBT;HMPA;LiveQuery;LiveTerminal;MCS;NTP;SAV;SDU;SHS;SWC;UI/endpoint/c7040dd6-xxxx-xxxx-xxxx-xxxxxxxcfe5
    2024-04-12T12:20:19.826Z [ 6316: 8300] I 200 : sent=0 rcvd=140 elapsed=11ms
    2024-04-12T12:20:19.827Z [ 6316: 8300] I Next command poll requested in 120s
    2024-04-12T12:20:19.834Z [ 6316: 8300] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Network-0000000001c33547-0000000001c338e8-133573977157069876-133573980180578061.json
    2024-04-12T12:20:19.834Z [ 6316: 8300] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\WinSec-0000000001c33767-0000000001c337b9-133573979550000000-133573980120000000.json
    2024-04-12T12:20:38.967Z [ 6316: 3120] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Dns-0000000001c33b66-0000000001c33b81-133573980304393586-133573980306550768.json
    2024-04-12T12:20:38.967Z [ 6316: 3120] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Network-0000000001c3390e-0000000001c33bc9-133573980181356263-133573980330031289.json
    2024-04-12T12:20:46.244Z [ 6316: 5892] I (async) 200 : chunk=97 rcvd=7 conntime=5100113ms
    2024-04-12T12:20:53.982Z [ 6316: 3120] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Dns-0000000001c33c0c-0000000001c33c0d-133573980350772314-133573980350772407.json
    2024-04-12T12:20:53.983Z [ 6316: 3120] W Feed channel scheduled_query: discarded file (backoff): C:\ProgramData\Sophos\Management Communications System\Endpoint\Channels\LiveQueryScheduled\Incoming\Network-0000000001c33bdf-0000000001c33dee-133573980330984992-133573980480719812.json
    

    CPU load has normalized.

  • Hi  , 

    We've made a change to the backend process so when a connection is made to 'mcs-push-server-eu-central-1.prod.hydra.sophos.com' it will no longer re-direct to a subscriber.

    You should therefore be able to add 'mcs-push-server-eu-central-1.prod.hydra.sophos.com' as an exclusion and it will work (no wildcard required).

  • thats great news.
    we've changed settings of some servers, but some server services are a bit sensitive when adding a system proxy so we could not change them all.

    On one of those not changed, I can see the issue was fixed yesterday at about 2024-04-23T19:00Z 

    PS: the servers with the most files and CPU load were Domain Controllers

Reply
  • thats great news.
    we've changed settings of some servers, but some server services are a bit sensitive when adding a system proxy so we could not change them all.

    On one of those not changed, I can see the issue was fixed yesterday at about 2024-04-23T19:00Z 

    PS: the servers with the most files and CPU load were Domain Controllers

Children
No Data