Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

Client internet interruptions every 15 minutes

Our environment is Dell Windows 11 workstations "Clients" connected to Cisco 3850 switches that all go out through the internet via our Sophos SF01V (SFOS 20.0.0 GA-Build222) firewall. DHCP and DNS done with local Windows servers.  We have about 140 Clients.

The symptoms are that most clients behind our firewall get a small interruption of a few seconds (what I call a blip) in the ability to establish new connections to hosts on the internet, but sometimes the small interruption is not just a few seconds it lasts a full 2 minutes (I call those outages). To the user at the client they complain that they can't open any new websites for a few minutes. The clients don't all have the same schedule when this happens to them, it does not happen to all of them at the same time but many of them follow the same schedule.  The schedule is consistent on each though even if the client reboots or their network card is disconnected in fact even if I change their IP address.  For example, some clients "blips" at 9, 24, 39 and 54 minutes after the hour, others blip at 0, 15, 30, 45 others have a different schedule but it stays consistent even days later. When they have the 2 minute "outage", they always line up with a normal "blip" time. Another twist is if they have an existing connection to an internet host open (say for example a continuous ping that repeats every few seconds) that will not be affected by the blip or the outage.  Yet another twist is that 3 clients out of the 12 or so we are monitoring don't have any blips or outages yet we can't figure out what is different about them. 

(I know you probably think I am crazy at this point, i am beginning to wonder myself)

To diagnose and log this madness, I made a little PowerShell script that pings 24 internet hosts with a single ping for each host then it waits 5 seconds then goes to the next host and pings it and repeats the host list once it has finished. The script also logs when a ping fails, and it logs if they have an "outage" (at least 3 failed pings in a row) and the end of the outage (when one of the hosts start to respond) one host ping success.   So we have this script running on 12 separate clients and thats when we learned of their repeating schedule of blips and the "outages", in fact we had not even noticed the blips until we started to do the logging, we were just trying to log the outages.

I have even done packet traces of a client when it has an outage and I can tell for certain the packets are going out to the firewall from the client but the firewall is not sending any responses.

This has been going on for months, but only recently did we start to do the advanced logging. Of the 12 that we have been logging details on 

- 3 don't have any blips or any outages

- 1 has blips the majority of the time (like they will miss a ping at their scheduled time about 75% of the time) but never an outage.

- 8 have blips about 75% of the time on their 15 minute schedule and then outages about 10 times per day. The outages always start at the normal blip schedule.  

So it seems to me that some process on the firewall stops allowing new connections to be established and this problem repeats every 15 minutes and that process is only serving some clients and then sometimes the problem hangs for a full 2 minutes before it recovers. 

Some other details:

- The interruptions are always a blip of less than 10 seconds or an outage of a full 2 minutes. I can tell by the logging it's never in between the two or longer than 2 minutes. 

- We have tried a different ISP in fact we have 3 separate ones hooked up to the firewall. 

- The schedule of blips/outages is very consistent on each client. if it's going to happen it happens at that time after the hour or it does not happen. 

- The fact that continuous pings to an internet host are never interrupted when a client has a blip or an outage, really makes me think it has to be something with the firewall. 



Added TAGs
[edited by: Erick Jan at 11:30 PM (GMT -7) on 31 Mar 2024]
Parents
  • Do you use STAS? Maybe the STAS Quarantine is enabled? Try to disable this. 

    __________________________________________________________________________________________________________________

  • We do have this on...  This does look promising.. we have the Inactivity timer set to 15 minutes and the identity probe timeout set to 120 seconds and we have Restrict client traffic during identity probe set to Yes.   But I am confused as the STAS seems to be working in that when I look at logs all the traffic has a user identified.  I am also confused why clients would be hitting the inactivity timeout as they definitely have traffic. 

  • My Authentication log looks like this..  Which doesn't look right. First we get a "good" message 

    2024-04-01 08:14:32Authenticationmessageid="17701" log_type="Event" log_component="Firewall Authentication" log_subtype="Authentication" status="Successful" user="USERNAME@ptraders.local" user_group="ALL" client_used="CTA" auth_mechanism="AD" reason="" src_ip="192.168.149.31" message="User USERNAME@ptraders.local of group ALL logged in successfully to Firewall through AD authentication mechanism from 192.168.149.31" name="USERNAME@ptraders.local" src_mac=""

    But then immediately afterwards we get a Bad message. 

    2024-04-01 08:14:32Authenticationmessageid="17702" log_type="Event" log_component="Firewall Authentication" log_subtype="Authentication" status="Failed" user="USERNAME@ptraders.local" user_group="" client_used="CTA" auth_mechanism="AD" reason="Login failed" src_ip="192.168.149.31" message="User USERNAME@ptraders.local failed to login to Firewall through AD authentication mechanism from 192.168.149.31 because of Login failed" name="" src_mac=""

    It does this for like all users at 15 minute intervals. 

  • Try to adjust your logoff detection in SFOS. Do you use it? Can you show us your STAS config in SFOS? 

    __________________________________________________________________________________________________________________

Reply Children
No Data