Internet becomes unresponsive after several days?

This is the second time this has occurred since using v18 EAP. I've also had this issue occur a couple times when running v17 but it wasn't as frequent. With v18 EAP, after Sophos XG has been running for several days (over a week), sometimes the internet becomes unresponsive as in I can't access anything. For example, if I try to access a website, it just continues trying to load and eventually times out. At first, I thought it was an ISP issue so I would reset my cable modem but that didn't fix the issue. I can still access devices on my local network just fine, such as the Sophos XG web UI. What I did notice in the web UI is the "Sessions" count under System in the Control Center indicates a very high number when I'm having these issues. It seems to fluctuate from ~800 up to 2.5k. I have about 30-40 devices on my network (one computer, mobile devices, smart home devices, etc.). Typically, my Sessions count is somewhere around 20-50. After restarting Sophos XG, the count goes back down to what I normally see and everything works fine.

Anyone else experiencing similar issues? Is there any specific log I can save when this issue occurs? Unfortunately, I'm running this on my home network so I can't just leave it in an unusable state.

  • In reply to Akilae:

    Hi,

    there is a bug in the ATP which will be fixed in the V18 GA I am advised. You will need to restart your XG if you disable ATP and re-enable it.

    Ian

  • In reply to Akilae:

    There is a known issue, fixed in GA, with ATP that may be related.  I think shred is experiencing this but I don't know if it is the cause of the connection/unresponsive.

     

    Take a look at /log/ips.log.  If you see a lot of "failed to get sessiontbl data for session id" then you may be experiencing the issue.

    Workaround:

    Advanced Threat > Enable Advanced Threat protection : Off

    System Services >  Services > IPS :  Stop and then Start

  • In reply to Michael Dunn:

    I think I may be having some other issues as well based on some information from Michael Dunn via PM but it also looks like I have the "failed to get sessiontbl data for session id" in my ips logs. Example:

    [Feb 03 16:47:37 :29704]:failed to get sessiontbl data for session id 528 rev 64837,dropping packet

    [Feb 03 16:47:37 :29704]:failed to get sessiontbl data for session id 528 rev 64837,dropping packet

    [Feb 03 17:23:12 :29703]:failed to get sessiontbl data for session id 574 rev 36762,dropping packet

    [Feb 03 17:23:12 :29703]:failed to get sessiontbl data for session id 574 rev 36762,dropping packet

    [Feb 03 17:23:12 :29703]:failed to get sessiontbl data for session id 574 rev 36762,dropping packet

    [Feb 03 17:23:42 :29704]:failed to get sessiontbl data for session id 350 rev 3155,dropping packet

    [Feb 03 17:25:49 :29701]:failed to get sessiontbl data for session id 146 rev 36239,dropping packet

    [Feb 03 17:25:49 :29701]:failed to get sessiontbl data for session id 462 rev 38120,dropping packet

    However, I've had ATP off for the past couple days (and restarted the IPS service after turning it off). I haven't had any issues with the internet being unresponsive since then.

  • In reply to shred:

    Assuming that shred does not get this again with ATP off, we can assume that is the cause.  Tracked in NC-55333 and already fixed in GA.

  • Hi together,

    thanks to this thread I noticed, that switching off ATP significantly improves loading speed of websites. Generally I'm more interested in improving latencies and response times than the deeply discussed bandwidth tests - which are always fine anyway with XG :-).

     

    Is this noticeable performance drop an expected behavior when using ATP or is this related to the known bug? (I have "sessionntbl" log entries in the ips.log as well.)

    And a second question: What exactly is "untrusted content" (seems to be a new ATP setting) and when will this be relevant?

     

    Thanks and Best Regards

    Dom

  • In reply to Dom Nik:

    Both ATP and the DPI mode of web are implemented within snort.  ATP also has new functionality for v18 - sorry I don't know any details.

    Right now there are several issues with ATP and how that affects DPI mode.  Some of these are fixed in GA and some are targeted for MR1.

    My personal recommendation right now is that unless you need it, turn off ATP.  After MR1 (or whatever the first major fix release is) then you can turn it on again.

  • In reply to Michael Dunn:

    Hi Michael,

    thanks for your reply.

    How do I determine if I need it? :)

    (In fact I had no IPS or ATPs alerts ever since I‘m using XG, except for 2-3 false positives)

    Regards

    Dom

  • In reply to Dom Nik:


    Advanced Threat Protection is enforced by three systems. DNS (eg domain names), Web (eg URLs), and snort (eg signatures).
    It focuses on one type of malware - that which is already infected a system and is now trying to contact a controlling server.

    The most common time it comes into play is if people have a laptop without an AV scanner which gets infected while not behind the XG, and then is added to your network.

    For example, a corporate laptop that does not have AV on it. Someone brings it home and gets it infected, then brings it back to the office. Once in the office it tried contacting its Command and Control server, which ATP blocks.

    Another example would be a coffee shop with a guest wifi network. Someone connects an infected laptop.

    Home networks where the number of new devices connecting is low, or where all computers have AV software installed are at lower risk and don't need ATP as much.

    Most of what ATP does (the FQDN and URLs) are also blocked by web categories, mostly the "Spyware and Malware" but also "Spam URLs" and "Phishing and Fraud". If you are using a Web Policy than blocks those categories you are almost as protected as with ATP.

    One of the benefits of ATP is that it works even if the Web Policy is None or Allow All. However that is also one of the drawbacks of ATP - you cannot create a rule that turns it off, it is always enforced. That is one of the problems we are facing with this release.

    Because the new DPI mode does port-agnostic HTTP detection that means that ATP can now enforce in new port-agnostic way. But that is in turn causing us to enforce a strict HTTP specification compliance on non-standard ports, which in turn causes problems when apps are using HTTP-like connections but do not conform to the spec.  When a company writes both the client and the server, and do things on non-standard ports (80/443) they sometimes do things against the spec.  By default DPI mode the blocks the connection because it is unscannable.

  • In reply to Michael Dunn:

    I upgraded to v18 GA yesterday and enabled ATP. Everything seems to be working okay so far but checking my ips logs tonight, I see a bunch of the messages below. No internet unresponsiveness issues yet.

    [Feb 19 14:31:03 :7535]:Error reading session data,status -1

    [Feb 19 14:31:03 :7535]:failed to get sessiontbl data for session id 340 rev 59828 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 14:31:36 :7534]:Error reading session data,status -1

    [Feb 19 14:31:36 :7534]:failed to get sessiontbl data for session id 92 rev 57781 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 14:31:36 :7535]:Error reading session data,status -1

    [Feb 19 14:31:36 :7535]:failed to get sessiontbl data for session id 1487 rev 12236 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 14:39:14 :7537]:Error reading session data,status -1

    [Feb 19 14:39:14 :7537]:failed to get sessiontbl data for session id 1488 rev 1869 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 14:39:31 :7535]:Error reading session data,status -1

    [Feb 19 14:39:31 :7535]:failed to get sessiontbl data for session id 121 rev 2682 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 14:40:07 :7535]:Error reading session data,status -1

    [Feb 19 14:40:07 :7535]:failed to get sessiontbl data for session id 377 rev 50558 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 14:40:07 :7534]:Error reading session data,status -1

    [Feb 19 14:40:07 :7534]:failed to get sessiontbl data for session id 376 rev 50558 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 14:40:07 :7537]:Error reading session data,status -1

    [Feb 19 14:40:07 :7537]:failed to get sessiontbl data for session id 863 rev 2481 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 14:41:49 :7536]:Error reading session data,status -1

    [Feb 19 14:41:49 :7536]:failed to get sessiontbl data for session id 233 rev 37009 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    1582153705.901504922 [ 7537/0x0] [nsg_nse_policy.c:1312:__nsg_error] 172.16.16.25:53859 to 75.2.53.94:443: Error from nse: NSE:Internal [0xb0000582;code:130;sub:5] Flow timeout

    [Feb 19 15:11:24 :7534]:Error reading session data,status -1

    [Feb 19 15:11:24 :7534]:failed to get sessiontbl data for session id 509 rev 19812 pkt_len 0 datalink_type 229 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 15:11:24 :7534]:Error reading session data,status -1

    [Feb 19 15:11:24 :7534]:failed to get sessiontbl data for session id 854 rev 2453 pkt_len 0 datalink_type 229 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 15:11:24 :7534]:Error reading session data,status -1

    [Feb 19 15:11:24 :7534]:failed to get sessiontbl data for session id 497 rev 20104 pkt_len 0 datalink_type 229 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 15:11:24 :7535]:Error reading session data,status -1

    [Feb 19 15:11:24 :7535]:failed to get sessiontbl data for session id 499 rev 20110 pkt_len 0 datalink_type 229 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 15:11:24 :7537]:Error reading session data,status -1

    [Feb 19 15:11:24 :7537]:failed to get sessiontbl data for session id 505 rev 20042 pkt_len 0 datalink_type 229 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    1582156094.971507455 [ 7537/0x0] [nsg_nse_policy.c:1312:__nsg_error] 2600:8801:7f06:1:f04c:135:d6da:3b53:50053 to 2607:fb90:c13f:fff6::2:443: Error from nse: NSE:Internal [0xb0000582;code:130;sub:5] Flow timeout

    [Feb 19 15:48:14 :7537]:Error reading session data,status -1

    [Feb 19 15:48:14 :7537]:failed to get sessiontbl data for session id 305 rev 38341 pkt_len 0 datalink_type 229 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 15:48:14 :7537]:Error reading session data,status -1

    [Feb 19 15:48:14 :7537]:failed to get sessiontbl data for session id 305 rev 38341 pkt_len 0 datalink_type 229 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 16:08:59 :7534]:Error reading session data,status -1

    [Feb 19 16:08:59 :7534]:failed to get sessiontbl data for session id 558 rev 33231 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 16:09:30 :7536]:Error reading session data,status -1

    [Feb 19 16:09:30 :7536]:failed to get sessiontbl data for session id 501 rev 20258 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    [Feb 19 17:14:46 :7536]:Error reading session data,status -1

    [Feb 19 17:14:46 :7536]:failed to get sessiontbl data for session id 381 rev 40659 pkt_len 0 datalink_type 228 direction 0 daq_source 2 is_tcp 0 nseid 0 is_ssl_non_app_appdata 0, dropping packet

    1582163910.312337972 [ 7534/0xe1f900000057] [nsg_tcphold.c:314:process_event] Could not find session for key and unique_id.

  • In reply to shred:

    Hi shred.  We have seen that internally as well.  Assuming that it is the same cause, you can ignore that error.  It can occur when both sides of a connection close simultaneously, 16 seconds later something times out and prints that.