This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Intermittent disconnected servers in Entreprise Console

Hello All,

Our environment contains one SEC 5.5.1 with server 2016, and is mainly use to protect Windows Server (2008R2, 2012R2,2016)

We are facing issues to have all servers connected to our console, numbers will go up and suddenly drop, and start to go up again

We have about 900 endpoints, where at max i will say we can see 300 connected all the rest is disconneted

On the SEC router log file we have many entries with

error code: 336462231 - error:140E0197:SSL routines:SSL_shutdown:shutdown while in init

I am assuming that all those actions are bring down the router service and that is why numbers dont go up and stay stable.

On the client side i can find various errors

19.05.2020 10:15:54 166C E ACE_SSL (5664|5740) error code: 336027804 - error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

some with 

19.05.2020 15:41:38 AED8 E ParentLogon::RegisterParent: Caught CORBA system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'
OMG minor code (2), described as '*unknown description*', completed = NO

Clients will get connected and the suddenly disconnected for some hours, and then comeback online.

Help and guidance, to where to start looking will be appreciated
I am not sure all servers are afected in the same way

Thanks in advance
Carlos



This thread was automatically locked due to age.
Parents
  • Hi  

    Wireshark would be more helpful here, please check this article and see if it helps you to give it a start. 

    Shweta

    Community Support Engineer | Sophos Technical Support
    Are you a Sophos Partner? | Product Documentation@SophosSupport | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.
    The New Home of Sophos Support Videos! - Visit Sophos Techvids
  • Hi thanks for your answer

    I checked already this part with wireshark, but unfortunately did not get any client with the filter set in this article

    Regards,

  • No Lightspeed agent installed

    It is basically a dedicated server for Sophos

    SQL and Sophos applications

    Regards,

    Carlos

  • Hello Carlos,

    no surprise that the filter shows no endpoints as it looks for connection attempts using TLS v1.0. They'd have to run an outdated version of RMS (lower than 4.x, from mid-2017), or there's have to be some intermediary device that tries to downgrade the protocol, or RMS on the endpoints is configured (I think this is possible) to use TLS v1.0. All this is more than unlikely.

    From your initial post I see error 0x1407609C(336027804 decimal), the full message text is HTTP spoken on HTTPS port. If you can't detect a pattern (that could help to narrow down on the problem) I'd suggest you try to identify an endpoint that "reliably" fails to connect and trace the attempted handshake with Wireshark - both on server and endpoint (filtering on port 8194 and endpoint or server IP respectively. As the connection is taken down early and there is no other chatter on this port the trace comprises only a few packets.

    Christian 

  • Hi Christian,

    Here is the trace from a client which is not showing as not managed

    I did as well a Test-connection on port 8194 and it fails, but somehow packets get to destination, so I am getting a bit lost
    I have checked on our firewalls and there is no deny for this client on 8194

     

    2643 16.536574 10.137.153.179 10.137.249.17 TCP 66 59393 → 8192 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    2644 16.536625 10.137.249.17 10.137.153.179 TCP 66 8192 → 59393 [SYN, ACK, ECN] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    2645 16.537182 10.137.153.179 10.137.249.17 TCP 60 59393 → 8192 [ACK] Seq=1 Ack=1 Win=2102272 Len=0
    2646 16.537357 10.137.249.17 10.137.153.179 TCP 506 8192 → 59393 [PSH, ACK] Seq=1 Ack=1 Win=2102272 Len=452
    2647 16.537381 10.137.249.17 10.137.153.179 TCP 54 8192 → 59393 [FIN, ACK] Seq=453 Ack=1 Win=2102272 Len=0
    2648 16.537879 10.137.153.179 10.137.249.17 TCP 60 59393 → 8192 [ACK] Seq=1 Ack=454 Win=2101760 Len=0
    2649 16.537928 10.137.153.179 10.137.249.17 TCP 60 59393 → 8192 [FIN, ACK] Seq=1 Ack=454 Win=2101760 Len=0
    2650 16.537947 10.137.249.17 10.137.153.179 TCP 54 8192 → 59393 [ACK] Seq=454 Ack=2 Win=2102272 Len=0
    2651 16.539514 10.137.153.179 10.137.249.17 TCP 66 59394 → 8194 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    2652 16.539525 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    2719 17.042503 10.137.153.179 10.137.249.17 TCP 66 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    2720 17.042520 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    2804 17.558558 10.137.153.179 10.137.249.17 TCP 62 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1
    2805 17.558575 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    3151 19.573850 10.137.153.179 10.137.249.17 TCP 66 [TCP Port numbers reused] 59394 → 8194 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    3152 19.573861 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    3264 20.074268 10.137.153.179 10.137.249.17 TCP 66 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    3265 20.074285 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    3363 20.589868 10.137.153.179 10.137.249.17 TCP 62 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1
    3364 20.589883 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    3710 22.605699 10.137.153.179 10.137.249.17 TCP 66 [TCP Port numbers reused] 59394 → 8194 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    3711 22.605714 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    3798 23.120726 10.137.153.179 10.137.249.17 TCP 66 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    3799 23.120743 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    3862 23.636860 10.137.153.179 10.137.249.17 TCP 62 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1
    3863 23.636879 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    4228 25.652559 10.137.153.179 10.137.249.17 TCP 66 [TCP Port numbers reused] 59394 → 8194 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    4229 25.652577 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    4302 26.167659 10.137.153.179 10.137.249.17 TCP 66 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    4303 26.167675 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    4386 26.683665 10.137.153.179 10.137.249.17 TCP 62 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1
    4387 26.683684 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    4734 28.699650 10.137.153.179 10.137.249.17 TCP 66 [TCP Port numbers reused] 59394 → 8194 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    4735 28.699661 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    4805 29.214618 10.137.153.179 10.137.249.17 TCP 66 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    4806 29.214634 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    4872 29.730252 10.137.153.179 10.137.249.17 TCP 62 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1
    4873 29.730273 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    5217 31.746510 10.137.153.179 10.137.249.17 TCP 66 [TCP Port numbers reused] 59394 → 8194 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    5218 31.746527 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    5296 32.261717 10.137.153.179 10.137.249.17 TCP 66 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    5297 32.261735 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    5373 32.777185 10.137.153.179 10.137.249.17 TCP 62 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1
    5374 32.777203 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    5755 34.793541 10.137.153.179 10.137.249.17 TCP 66 [TCP Port numbers reused] 59394 → 8194 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    5756 34.793553 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    5833 35.309012 10.137.153.179 10.137.249.17 TCP 66 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    5834 35.309025 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    5896 35.824161 10.137.153.179 10.137.249.17 TCP 62 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1
    5897 35.824179 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    6335 37.841091 10.137.153.179 10.137.249.17 TCP 66 [TCP Port numbers reused] 59394 → 8194 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    6336 37.841105 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    6398 38.341343 10.137.153.179 10.137.249.17 TCP 66 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    6399 38.341359 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    6467 38.841285 10.137.153.179 10.137.249.17 TCP 62 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1
    6468 38.841310 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    6795 40.857545 10.137.153.179 10.137.249.17 TCP 66 [TCP Port numbers reused] 59394 → 8194 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    6796 40.857560 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    6875 41.372544 10.137.153.179 10.137.249.17 TCP 66 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    6876 41.372561 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    6941 41.888771 10.137.153.179 10.137.249.17 TCP 62 [TCP Retransmission] 59394 → 8194 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1
    6942 41.888794 10.137.249.17 10.137.153.179 TCP 54 8194 → 59394 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

  • client_hello.txt

    I have attached here the capture from a client which is showing as not connected in console

    There is an interesting line with the server hello, where it says Malformed packet:TLS

    Regards,

    Carlos

  • Hello Carlos,

    obtaining the IOR from port 8192 succeeds. But this client doesn't get as far as establishing a TCP connection on 8194, let alone initiating the handshake. But on to your next post.

    Christian

  • Hello Carlos,

    the capture doesn't show the details of the TLS traffic thus it's not clear what happens and when (and whether the this session was abnormally taken down, or rather not initiated). 

    There are some interesting points: First of all, there seem to be host standby routers between endpoint and server as the packets from the server are addressed to a HSRPv1 Virtual MAC. Most of the time the endpoint advertises a rather small TCP receive window (256 bytes or less): Later in the trace there are SACKs that suggest lost packets. The Malformed Packet is a red herring, I see it as well, think it's Wireshark's dissector.

    If possible please run Wireshark on both ends at the same time. With a specific capture filter (host 10.xxx.xxx.xxx and port 8194) it shouldn't have significant impact. Sometimes it helps to see both sides.

    Christian

  • Christian,

    Thanks for your answer

    Here are attached the 2 captures on the sides (I add to use another server that the one above, but this one as same symptoms and is not PROD so I could reboot and test different things)

    sec_to_client_8194.txt

    client_to_SEC_8194.txt

     

    Regards,

  • Hello Carlos,

    this one looks like a complete fail as the server responds to every SYN with an RST. You said that there's no port exhaustion, didn't you? Anything in the Event logs on the server? Both servers (the server and the client) are VMs, aren't they?

    There seems to be three scenarios: The server refusing connection with a RST, the TLS handshake failing, and some connections working (with maybe some that are stable and some that intermittently fail). If I were to encounter this I'd blame it on HSRP, Cisco, and the network guys, No, seriously, if there is an infrastructure problem it should surface with other TLS connections as well, not only RMS.

    I might be wrong (had a hard week - hold it, it's just half a week but tomorrow's a holiday and it's a false Friday today) but there seems to be no problem with RMS (or its use of TLS). I'd suspect some arcane network problem and at my site I'd collect Wireshark captures from the server and several select endpoints, maybe some Sniffer captures on the network devices as well, and try to find a pattern. In addition I'd try to find out when it started (if it wasn't there from the beginning) and what has been changed around this time.
    Troubleshooting was already much too easy with physical machines and plain network devices, almost any idiot could do it. So all this fancy virtualization stuff and nifty network equipment have been invented [;)].
    The bright side is, such problems can be solved and the solution is often rather simple. It just requires some targeted effort.

    Christian

  • Hi Christian

    Thanks for all the investigation you have done so far

    So Both Servers are VM on VMware Infrastructure, most of the endpoints are in the same infrastructure.

    The full TCP dynamic range is open it is about 16000+ so i do not believe it port exhaustion

    In the Event log, I do have a lot of Schannel errors :

    The certificate received from the remote server has not validated correctly. The error code is 0x80092013. The TLS connection request has failed. The attached data contains the server certificate.

    Not sure it is related, I do have this error on endpoints as well

    After reading your post and some thinking, I have been checking the servers that are stable (I have about 40 VMs) where i noticed they are always green and connected

    So funny fact is that the stable VMs are on the same VLAN as the SEC, so there is no firewall in between

    I am starting to suspect firewalls, what do you think ? 

    I will need to investigate with our Network team...

    I wish you a nice long weekend, and talk to you soon

    Regards,

    Carlos

  • Hi Carlos,

    Thanks for adding the router key exports.  I really just wanted to check the connection cache and thread values were correct for the server.

    I've seen configurations where the router on a relay or management server only had a connection cache of 10, the same as a client.  As a result, the connections have to be recycled between all the managed endpoints and could create a similar scenario.  You are OK there.

    You mention VMware for the management server,  does it have vMotion setup? I've heard of weird connection issues if the management server is "moved".

    Regards,
    Jak 

Reply
  • Hi Carlos,

    Thanks for adding the router key exports.  I really just wanted to check the connection cache and thread values were correct for the server.

    I've seen configurations where the router on a relay or management server only had a connection cache of 10, the same as a client.  As a result, the connections have to be recycled between all the managed endpoints and could create a similar scenario.  You are OK there.

    You mention VMware for the management server,  does it have vMotion setup? I've heard of weird connection issues if the management server is "moved".

    Regards,
    Jak 

Children