Temporary TLS handshake timeouts

We're experiencing temporary TLS handshake timeouts resulting in websites not loading. For example if we want to got to google.com we sometimes see that the browser is trying to do the handshake and after the specified timeout the webpage does not load. Most times it needs 3 or 4 reloads before the page gets loaded. It will work then for a certain time before it happens again. This can be seen with several websites. No specific time, no specific websites, just TLS handshake suddenly failing for some time.

We already had this issue with 17.0.5 and it did not change with 17.1.1.

A few days ago I checked MTU and MSS and changed it from 1500 to 1492 according to our WAN connections. I thought it got better but now it seems that the issue still persists.

Has anybody seen this before?

  • Hi Jelle,

    I think this is about the same subject. I thought it was the imap proxy, but do suffer occasionally on some sites.

    https://community.sophos.com/products/xg-firewall/f/email-protection/105385/there-is-a-bug-in-the-email-imap-proxy

     

    Ian

  • i have noticed this issue today with web whatsapp 

    " TLS handshake timeouts "

    anyone would confirm this ?

  • In reply to rfcat_vk:

    This morning the issue became a major one as a lot of requests were failing at least the first time. So I rebooted the primary appliance and the auxiliary appliance took over. Guess what? The issue is not happening on the auxiliary (now primary) device although it is up since the installation of SFOS 17.1.1 like the other appliance.

    So why do the two device behave different although their uptime was the same? Could be an hardware issue or what is more likely that something is slowing down the appliance over time when it is primary in A-P mode.

    I'm going to check this during the next days. Maybe the issue can be seen on the other appliance too after it has been primary for some time.

  • In reply to Jelle:

    Hi,

    were both boxes built using the same version of software? Have they both been upgraded in synch?

    Ian

  • In reply to rfcat_vk:

    Yes, both boxes were built with the same version and have been in HA cluster since then. Updates were always done in HA mode / in synch.

  • In reply to Jelle:

    This issue seems to be kinda odd. 

    Can you reproduce it by going back to the "faulty" appliance? 

    And the HA sync only information between each other. So basically a module "can" get broken on one appliance and work on the other. 

    But do you see the same issue, if you only use the firewall? So without any protection? 

    If so - would take a deeper look at the Interface which this appliance use. Could be some kind of issue between ISP router and the other appliance WAN. Saw such cases quite often. 

  • In reply to LuCar Toni:

    I will first check if the issue comes up on the currently active appliance after some time. If not I will investigate further on the faulty device.

  • In reply to LuCar Toni:

    This morning we had the same issue with the former auxiliary device now being primary. So after around 12 days it happened again that connecting to the internet failed almost at every attempt. During the last days we already had problems with TLS handshakes again. Switched to the other HA-device by rebooting the primary appliance. Now everything works fine, at least for the next couple of days.

    I'm at MR1 but updating to MR3 seems to be no option as bugs from MR2 still aren't fixed.

  • In reply to Jelle:

    Most likely this issue is not caused by XG instead by something on the ISP site. 

    Do you have the change to dump this issue? Would like to see this issue live in a PCAP file. 

  • In reply to LuCar Toni:

    Well, we didn't have this issue with 17.0.5 MR5. Started with 17.1.1 MR1. Even didn't have the issue with our Sonicwall appliance over years. Nothing changed on ISP side so far.

    Until now I didn't do a dump. Is there any information on how to do this?

  • In reply to Jelle:

    The point is, nothing has change until the time MR5. Could be some kind of issue in the environment after this timeframe.

    Perform a Dump in case of failure before you reboot.

     

    Login into the Shell and go to the advanced Shell (5 - 3)

    Perform a Dump like this:

    tcpdump -ni any port 443 -b -s0 -w /tmp/dump.pcap

    Download it with pscp community.sophos.com/.../127647
    (pscp.exe -scp admin@IP:/tmp/dump.pcap \. )

  • In reply to Jelle:

    You should delete this file and refer to the sophos support for advice.

  • In reply to LuCar Toni:

    Still waiting for a reply from Sophos on my support ticket opened 8 days ago. In the meantime I came across this thread https://community.sophos.com/products/xg-firewall/f/firewall-and-policies/92313/strange-drops which lead me to the possibility that STAS is causing trouble. So I deactivated STAS on our DCs and XG and the issue disappaered immidiately. OK, now I don't see which user belongs to a connection etc. but the performance issues, timeouts and drops are gone.

    Keeping fingers crossed that Sophos Central integration in SFOS 17.5 brings STAS functionality without STAS as described in the feature list.