Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

Client internet interruptions every 15 minutes

Our environment is Dell Windows 11 workstations "Clients" connected to Cisco 3850 switches that all go out through the internet via our Sophos SF01V (SFOS 20.0.0 GA-Build222) firewall. DHCP and DNS done with local Windows servers.  We have about 140 Clients.

The symptoms are that most clients behind our firewall get a small interruption of a few seconds (what I call a blip) in the ability to establish new connections to hosts on the internet, but sometimes the small interruption is not just a few seconds it lasts a full 2 minutes (I call those outages). To the user at the client they complain that they can't open any new websites for a few minutes. The clients don't all have the same schedule when this happens to them, it does not happen to all of them at the same time but many of them follow the same schedule.  The schedule is consistent on each though even if the client reboots or their network card is disconnected in fact even if I change their IP address.  For example, some clients "blips" at 9, 24, 39 and 54 minutes after the hour, others blip at 0, 15, 30, 45 others have a different schedule but it stays consistent even days later. When they have the 2 minute "outage", they always line up with a normal "blip" time. Another twist is if they have an existing connection to an internet host open (say for example a continuous ping that repeats every few seconds) that will not be affected by the blip or the outage.  Yet another twist is that 3 clients out of the 12 or so we are monitoring don't have any blips or outages yet we can't figure out what is different about them. 

(I know you probably think I am crazy at this point, i am beginning to wonder myself)

To diagnose and log this madness, I made a little PowerShell script that pings 24 internet hosts with a single ping for each host then it waits 5 seconds then goes to the next host and pings it and repeats the host list once it has finished. The script also logs when a ping fails, and it logs if they have an "outage" (at least 3 failed pings in a row) and the end of the outage (when one of the hosts start to respond) one host ping success.   So we have this script running on 12 separate clients and thats when we learned of their repeating schedule of blips and the "outages", in fact we had not even noticed the blips until we started to do the logging, we were just trying to log the outages.

I have even done packet traces of a client when it has an outage and I can tell for certain the packets are going out to the firewall from the client but the firewall is not sending any responses.

This has been going on for months, but only recently did we start to do the advanced logging. Of the 12 that we have been logging details on 

- 3 don't have any blips or any outages

- 1 has blips the majority of the time (like they will miss a ping at their scheduled time about 75% of the time) but never an outage.

- 8 have blips about 75% of the time on their 15 minute schedule and then outages about 10 times per day. The outages always start at the normal blip schedule.  

So it seems to me that some process on the firewall stops allowing new connections to be established and this problem repeats every 15 minutes and that process is only serving some clients and then sometimes the problem hangs for a full 2 minutes before it recovers. 

Some other details:

- The interruptions are always a blip of less than 10 seconds or an outage of a full 2 minutes. I can tell by the logging it's never in between the two or longer than 2 minutes. 

- We have tried a different ISP in fact we have 3 separate ones hooked up to the firewall. 

- The schedule of blips/outages is very consistent on each client. if it's going to happen it happens at that time after the hour or it does not happen. 

- The fact that continuous pings to an internet host are never interrupted when a client has a blip or an outage, really makes me think it has to be something with the firewall. 



Added TAGs
[edited by: Erick Jan at 11:30 PM (GMT -7) on 31 Mar 2024]