Internet becomes unresponsive after several days?

This is the second time this has occurred since using v18 EAP. I've also had this issue occur a couple times when running v17 but it wasn't as frequent. With v18 EAP, after Sophos XG has been running for several days (over a week), sometimes the internet becomes unresponsive as in I can't access anything. For example, if I try to access a website, it just continues trying to load and eventually times out. At first, I thought it was an ISP issue so I would reset my cable modem but that didn't fix the issue. I can still access devices on my local network just fine, such as the Sophos XG web UI. What I did notice in the web UI is the "Sessions" count under System in the Control Center indicates a very high number when I'm having these issues. It seems to fluctuate from ~800 up to 2.5k. I have about 30-40 devices on my network (one computer, mobile devices, smart home devices, etc.). Typically, my Sessions count is somewhere around 20-50. After restarting Sophos XG, the count goes back down to what I normally see and everything works fine.

Anyone else experiencing similar issues? Is there any specific log I can save when this issue occurs? Unfortunately, I'm running this on my home network so I can't just leave it in an unusable state.

  • Hi Shred,

    have you tried editing the WAN interface, but not making any changes then saving?

    Ian

  • In reply to rfcat_vk:

    Such issues with multiple sessions could be caused still by WAN ISP. 

    If your client cannot connect properly, he will access multiple times, all the times, and XG will hold those sessions. 

    Without a Dump, it is hard to tell, what is going on. 

     

    If this issue appears, could you take a look at the tcpdump? 

    Which Provider / box do you use? Something on this box? 

     

    I found an issue with my Unitymedia box in Germany. This ISP box did some similar issues. Actually it responded to all DNS request with his own IP. So Google was 192.168.1.1 etc. All my clients started to connect to this unitymedia box. 

    This issue came up couple of times and stopped after some weeks. 

  • In reply to LuCar Toni:

    Ah, makes sense. I’ll try to get a tcpdump next time. I’m not too familiar with using tcpdump - is there any specific parameters I should run tcpdump with to capture what is needed to troubleshoot this particular issue?

    My ISP is Cox (U.S.), which is a cable internet service (1Gb down/35Mbps up). I have my Sophos XG device connected directly to a Motorola cable model I own.

  • Hi  

    Thanks for feedback.

    Please find the below KB article which helps to do tcpdump or monitor the packets.

    Please get back to us when you are facing the issue so we can check at the same time

     

    Thanks,

    Rana Sharma

  • In reply to Rana Sharma:

    I'm experiencing the issue again, but I'm not sure what you want me to capture with tcpdump. When I run it in the Sophos console, it's just a continuous stream of:

     

    13:09:10.513375 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59003864:59004036, ack 33013, win 317, length 172

    13:09:10.513398 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59004036:59004208, ack 33013, win 317, length 172

    13:09:10.513417 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59004208:59004380, ack 33013, win 317, length 172

    13:09:10.513440 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59004380:59004552, ack 33013, win 317, length 172

    13:09:10.513459 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59004552:59004724, ack 33013, win 317, length 172

    13:09:10.513482 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59004724:59004896, ack 33013, win 317, length 172

    13:09:10.513500 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59004896:59005068, ack 33013, win 317, length 172

    13:09:10.513523 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59005068:59005240, ack 33013, win 317, length 172

    13:09:10.513541 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59005240:59005412, ack 33013, win 317, length 172

    13:09:10.513565 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59005412:59005584, ack 33013, win 317, length 172

    13:09:10.513583 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59005584:59005756, ack 33013, win 317, length 172

    13:09:10.513606 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59005756:59005928, ack 33013, win 317, length 172

    13:09:10.513623 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59005928:59006100, ack 33013, win 317, length 172

    13:09:10.513647 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59006100:59006272, ack 33013, win 317, length 172

    13:09:10.513665 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59006272:59006444, ack 33013, win 317, length 172

    13:09:10.513688 Port1, OUT: IP 172.16.16.16.22 > 172.16.16.34.52058: Flags [P.], seq 59006444:59006616, ack 33013, win 317, length 172

     

    172.16.16.16 is my Sophos XG device, 172.16.16.34 is my computer that I'm running tcpdump from. When I try it from another computer, it's the same thing except the destination address is the IP address of that device. I'm guessing this is not what you're looking for, but I'm not sure what tcpdump parameters to use to collect for troubleshooting.

  • In reply to shred:

    Hi  

    Thanks for passing the information. Please connect me on PM with support access id so we can check in live.

    Just one recommendation to try for the same : Please stop the ips service and try to surf the internet and share the feedback with us.

     

    Thanks,

    Rana Sharma

  • In reply to Rana Sharma:

    I ended up leaving everything alone and it appears to have resolved itself. When the issue occurs, this is what I notice:

    - Internet becomes mostly unresponsive. What I mean is if I try to access a website, it will just sit there and never resolve. However, if I keep trying, at some point it will work for a very short period of time then go back to being unresponsive again.

    - When the internet is unresponsive, I notice the “Session” count is very high in the Sophos XG UI. I typically see a value around 50-200 when everything is working normally. When this issue occurs, I see the count somewhere around 500-1000+. It’s during those high session count periods the internet is unresponsive. It will drop back down to a “normal” value and the internet seems to work fine, but then shortly after the session count will increase again and the internet becomes unresponsive.

    I’m not sure if this is an ISP issue or not, but I’ve had this issue in the past where I restarted the modem and the issue persisted. After restarting Sophos XG, the problem seemed to have resolved but it could have also been coincidence.

    Unfortunately I’m running Sophos XG on my home network so I typically can’t leave it in an unusable state for very long. If you could provide me with specific instructions on the type of tcpdump command to run and record data, I’d be more than happy to do it when the issue occurs again.

  • In reply to shred:

    So I just had the issue occur again. This time I immediately restarted my modem to see if it's maybe an ISP issue that just required a modem reset. The issue still remained. I then restarted Sophos XG, and now the issue seems to be resolved. Again, it could all just be coincidence with an ISP issue that resolves itself exactly when I restart Sophos XG, but this is the third time I've restarted Sophos XG and the issue was resolved.

    Edit: Forgot to add, I don't think IPS is the issue because I have devices on firewall rules that don't have IPS policies that were having the same issues.

  • In reply to shred:

    PM'd you for more info.

  • Hi there

    What country are you in?

    What is the model of the cable modem?

    How is the WAN interface in the Sophos configured?

     

  • In reply to GavinDaniels:

    I’m located in USA. It’s a Motorola MB8600 cable modem and the WAN interface is the default configuration. The only addition I made was adding an IPv6 interface.

  • In reply to shred:

    Hi,

    In Australia we use HFC cable and Arrias Cable Modems. Several ISP's will use PPPOE Authentication for their connections.

    I found that I would get a lockup of the interface for a while when the WAN port was configured at a 1500MTU size. Dropping it to a 1492MTU size to allow for the 8 byte pppoe Header.

     

    When I moved to another carrier who also uses PPPOE over a Vlan connection, I needed to reduce the MTU size to 1460.

    Which is also where I added the forum request to make the MTU size of an unconfigured WAN port adjustable. As configuring the hardware port as WAN or DMZ with a static IP was a waste,

  • In reply to GavinDaniels:

    Ah, I see. My ISP is Cox which does not use PPoE.

    , The issue started occurring again last night. Same issue and symptoms. I left everything alone to see if the problem would still be there in the morning and sure enough, it was. I left everything as is for about an hour while I was digging around to see if I could see anything abnormal, but nothing appeared out of ordinary other than the increasing/decreasing session counts that I also see associated with this issue. Instead of restarting the entire thing, I tried restarting just the ips service from the advanced shell and after doing so, it appears the issue is resolved. Session counts are back down to “normal” levels and I also noticed the memory usage dropped from ~50% to ~38%. I did take a Consolidated Troubleshooting Report when the issue was occurring.

    So, as far as I can tell, the issue appears to be caused by the ips service that is resolved with an ips service restart. Here are my ips settings: 

    Sophos Firmware Version SFOS 18.0.0 EAP3-Refresh1

    console> show ips-settings
    -------------IPS Settings-------------
    stream on
    lowmem off
    maxsesbytes 0
    maxpkts 8
    enable_appsignatures on
    http_response_scan_limit 65535
    search_method hyperscan
    sip_preproc enabled
    sip_ignore_call_channel enabled
    inspect untrusted-content

    -------------IPS Instances------------
    IPS CPU
    1 0
    2 1
    3 2
    4 3

  • In reply to shred:

    Well, restarting the ips service seemed to have fixed it for about an hour. Unfortunately, the issue is back.

    Edit: So I restarted the ips service again and everything has been working fine for the past two hours.

    I’m wondering if the issue is with ATP. Before I was having the issue last night, I enabled ATP and the issue started occurring shortly after. I notice when I enable ATP, the memory usage jumps up quite a bit. Even after disabling ATP, the issue remained and the memory usage remained as well. This morning when I restarted the ips service to see if it fixed the issue, I did the same thing of enabling ATP and shortly after the issue started occurring. Same symptoms after disabling ATP (increased memory usage remained, issue still existed). I restarted the ips service this last time but left ATP enabled and so far, everything seems to be working fine.

  • In reply to shred:

    Seems like I experienced the exact same issues after activating ATP on EAP3 and I have left ATP offline since then. Gonna try it again now if it still persists. I notice it that my internet connected IoT devices like Tado heating and so on are suddenly offline. This gets fixed turning off the ATP. If I don‘t do that sooner or later normal browsing gets affected as well.