Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SFOS 16.01.0 known IPS issue - Work arounds?

Hey all,

Anyone have any other work around for the known IPS issue (NC-8238   [IPS] IPS Service drops legitimate traffic in very high load average conditions)? The IPS service seems to constantly fail to start and causes this issue from what I can see (CPU usage and memory usage spike all over the place). As my work around, I set the IPS service to Stop, performance and traffic return to normal. Obviously this isn't a great solution... Anyone have anything better? 

I'd like to know when this will be resolved too, seems to me to be a rather big problem. I may actually just roll back to 15 if this is going to be a thing for a while.

Thanks !!



This thread was automatically locked due to age.
Parents
  • Hi Darrian,

    To get a broader view on this, take SSH to XG and go to option 4. Device console and execute the command, show ips-settings. Post the output.

    Which XG hardware model do you use and what is the number of concurrent active connection on XG when this issue is live? If there are some legitimate traffic being dropped through IPS, check in the Log Viewer>IPS page and allow the signature in the IPS policy.

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Hi, 

    As requested: 

     

    -------------IPS Settings-------------                                                              
            stream on                                                                                   
            lowmem on                                                                                   
            maxsesbytes 0                                                                               
            maxpkts 80                                                                                  
            mmap on                                                                                     
            enable_appsignatures on                                                                     
            http_response_scan_limit  65535                                                             
                                                                                                        
                                                                                                        
    -------------IPS Instances------------                                                              
    IPS CPU                                                                                             
     1  0                                                               

    The issue is being experienced mainly on a PC (3GB RAM, dual core CPU @ 2.0GHz) running Sophos XG 16.1.01. The issue starts as soon as the IPS Service starts, and only 1 connection.

                                    
  • Thank you for the suggestion. As far as I know, this being a home license, I am not able to report the issue to support. Is that correct? 

     

    Thank you,

    Darrian

  • Hello,

     

    I'm in the same boat as Darrian.  I see the same warning messages as in his screenshot.  It seems to be cyclical though.  Sometimes it will show all is normal and IPS was started fine, then within a few minutes I'll see this warning pop up again.  Also, I'm not having the CPU or memory utilization he is though, mine has not risen above 30%.

     

    I'm running XG 16.01.2 on a dedicated PC.  It's a new install that has very little config on it, as I've been trying to get this working before I complicated things.  I've tried reboots, stopping and starting the IPS service, and now I've re-installed the entire firewall software.  I've also tried what was suggested in this post with no results: https://community.sophos.com/products/xg-firewall/f/intrusion-prevention/10897/ips-engine-dead

     

    Has anyone figured out how to get this working?

     

    Thanks,

    Jared

  • Hi All,

    who is experiencing this issue, could open a ticket with the Support and report the ticket number to ?

    The IPS problem should be fixed ASAP. I am sure that this is a bug and it occurs only on some systems.

    Thanks

  • Thanks Iferrara,  as soon as the Contact Support link on Sophos's website works again, I'll submit a ticket.  I take it then that no one has figured this out?

    I did discover one additional piece of information.  My CPU and Memory utilization would spike between 15-30% continually.  Once I just turned off the IPS service, it settled down to a constant value with no spiking at all.  Can't remember the exact percentage, I think it's somewhere between 5-10%.

    Thanks,

    Jared

  • Ok, made it in, and I got the below.  So unless there is someone else out there with an enterprise account, willing to ask on my behalf?

    "Home User

    Important: We currently do not provide phone or email support for our free products and tools.

    You can find online documentation and videos as well as support through Sophos Community"

     

    Would it help if I provided the exact hardware I am using?

     

    Thanks,

    Jared

     

  • Here's another question, is there a way to get into the actual Linux running Sophos?  Not just the limited Cisco'esk commands?  That way I can get see where the IPS service is failing?

     

    Thanks,

    Jared

  • Using ssh, option 5 and then 3.

  • Yeah as I stated earlier in the thread, it doesn't help that these home user licenses are guided to use the forum, and in the forum we are told to open a ticket when we cannot. 

    I'm still having this issue on the latest release of the SFOS fw. Its the pits really, multiple revisions and still a problem. I still have the same CPU spiking and ram usage as before, with the service starting and failing on a loop. Pity because other than that I love the firewall. My Cyberoam CR300ing XP at the office doesn't exhibit this behavior at all. 

  • Here's some of the IPS.log and CSC.log in case anyone can look at it and tell me what's wrong.  Understand that I had IPS service turned off for a few days.  I just turned it on, and grabbed the logs in the IPS.log file from today to make it smaller.

    Any sleuths out there that can tell me whats wrong?

    Thanks,

    Jared

    8372.SophosLogs.zip

  • Ok, I see that the Snort process says it finishes starting successfully, however if you look at the files I posted, you see that the it is constantly being reloaded.  I'm hoping someone on this forum is able to help.

    [Dec 07 01:18:22 :30756]:readfd cdata.cpipe[1] for pid 30854 set
    INFO[30753]:Dec 07 01:18:22:s_worker.c:943:log_master_starttime:Snort start time(Success): 60 sec 220993 usec
    [Dec 07 01:18:22 :30854]:Total preallocated memory : 2036.1094KB
    [Dec 07 01:18:22 :30854]:Max memory alloc by webcat: 3732.1094KB
    [Dec 07 01:18:22 :30756]:load module: '/sbin/modprobe nf_conntrack_pktq queuenum=100 qnum=0 maxsesbytes=0 mode=0' done
    [Dec 07 01:18:22 :30756]:IPS now running
    [Dec 07 01:18:22 :30756]:Time for change master(traffic bypass): 0 sec 38 msec
    [Dec 07 01:18:22 :30756]:child 30854 dead
    [Dec 07 01:18:22 :30756]:cdata[0].lstatus for pid 30854 set
    [Dec 07 01:18:22 :30756]:IPS: child pid 30854 exited with signal 4
    [Dec 07 01:18:22 :30859]:pktq DAQ configured to inline.
    INFO[30859]:Dec 07 01:18:22:daq_pktq.c:459:InitPKTQ:msize 40000
    INFO[30859]:Dec 07 01:18:22:daq_pktq.c:464:InitPKTQ:pkt_queue with nl_fd 7 device /dev/pktq0
    [Dec 07 01:18:22 :30859]:Reload thread starting...
    [Dec 07 01:18:22 :30859]:Reload thread started, thread 0x7f206caee700 (30860)

     

    When I check the CSC log, I find these errors:

    MESSAGE Dec 07 01:18:27 [ips:3131]: do_waitpid: pid 30753 exited with status 0
    ERROR Dec 07 01:18:27 [ips:3131]: close(7) failed: Bad file descriptor
    MESSAGE Dec 07 01:18:28 [ips:3131]: Child exited with status 1
    ERROR Dec 07 01:18:28 [ips:3131]: do_stop: after_stop failed. not aborting!
    ERROR Dec 07 01:18:30 [u2d_pt_installer:3111]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:18:30 [u2d_pt_installer:3111]: action with nofail failed
    ERROR Dec 07 01:18:30 [u2d_dr_installer:3102]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:18:30 [u2d_dr_installer:3102]: action with nofail failed
    ERROR Dec 07 01:19:30 [u2d_pt_installer:3111]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:19:30 [u2d_pt_installer:3111]: action with nofail failed
    ERROR Dec 07 01:19:30 [u2d_dr_installer:3102]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:19:30 [u2d_dr_installer:3102]: action with nofail failed
    MESSAGE Dec 07 01:19:49 [ips:3131]: do_stop(): status = EXITING
    MESSAGE Dec 07 01:19:49 [ips:3131]: do_waitpid: pid 30986 exited with status 0
    ERROR Dec 07 01:19:49 [ips:3131]: close(7) failed: Bad file descriptor
    MESSAGE Dec 07 01:19:50 [ips:3131]: Child exited with status 1
    ERROR Dec 07 01:19:50 [ips:3131]: do_stop: after_stop failed. not aborting!
    ERROR Dec 07 01:20:30 [u2d_dr_installer:3104]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:20:30 [u2d_dr_installer:3104]: action with nofail failed
    ERROR Dec 07 01:20:30 [u2d_pt_installer:3102]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:20:30 [u2d_pt_installer:3102]: action with nofail failed
    MESSAGE Dec 07 01:21:11 [ips:3131]: do_stop(): status = EXITING

    Thanks,

    Jared

Reply
  • Ok, I see that the Snort process says it finishes starting successfully, however if you look at the files I posted, you see that the it is constantly being reloaded.  I'm hoping someone on this forum is able to help.

    [Dec 07 01:18:22 :30756]:readfd cdata.cpipe[1] for pid 30854 set
    INFO[30753]:Dec 07 01:18:22:s_worker.c:943:log_master_starttime:Snort start time(Success): 60 sec 220993 usec
    [Dec 07 01:18:22 :30854]:Total preallocated memory : 2036.1094KB
    [Dec 07 01:18:22 :30854]:Max memory alloc by webcat: 3732.1094KB
    [Dec 07 01:18:22 :30756]:load module: '/sbin/modprobe nf_conntrack_pktq queuenum=100 qnum=0 maxsesbytes=0 mode=0' done
    [Dec 07 01:18:22 :30756]:IPS now running
    [Dec 07 01:18:22 :30756]:Time for change master(traffic bypass): 0 sec 38 msec
    [Dec 07 01:18:22 :30756]:child 30854 dead
    [Dec 07 01:18:22 :30756]:cdata[0].lstatus for pid 30854 set
    [Dec 07 01:18:22 :30756]:IPS: child pid 30854 exited with signal 4
    [Dec 07 01:18:22 :30859]:pktq DAQ configured to inline.
    INFO[30859]:Dec 07 01:18:22:daq_pktq.c:459:InitPKTQ:msize 40000
    INFO[30859]:Dec 07 01:18:22:daq_pktq.c:464:InitPKTQ:pkt_queue with nl_fd 7 device /dev/pktq0
    [Dec 07 01:18:22 :30859]:Reload thread starting...
    [Dec 07 01:18:22 :30859]:Reload thread started, thread 0x7f206caee700 (30860)

     

    When I check the CSC log, I find these errors:

    MESSAGE Dec 07 01:18:27 [ips:3131]: do_waitpid: pid 30753 exited with status 0
    ERROR Dec 07 01:18:27 [ips:3131]: close(7) failed: Bad file descriptor
    MESSAGE Dec 07 01:18:28 [ips:3131]: Child exited with status 1
    ERROR Dec 07 01:18:28 [ips:3131]: do_stop: after_stop failed. not aborting!
    ERROR Dec 07 01:18:30 [u2d_pt_installer:3111]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:18:30 [u2d_pt_installer:3111]: action with nofail failed
    ERROR Dec 07 01:18:30 [u2d_dr_installer:3102]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:18:30 [u2d_dr_installer:3102]: action with nofail failed
    ERROR Dec 07 01:19:30 [u2d_pt_installer:3111]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:19:30 [u2d_pt_installer:3111]: action with nofail failed
    ERROR Dec 07 01:19:30 [u2d_dr_installer:3102]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:19:30 [u2d_dr_installer:3102]: action with nofail failed
    MESSAGE Dec 07 01:19:49 [ips:3131]: do_stop(): status = EXITING
    MESSAGE Dec 07 01:19:49 [ips:3131]: do_waitpid: pid 30986 exited with status 0
    ERROR Dec 07 01:19:49 [ips:3131]: close(7) failed: Bad file descriptor
    MESSAGE Dec 07 01:19:50 [ips:3131]: Child exited with status 1
    ERROR Dec 07 01:19:50 [ips:3131]: do_stop: after_stop failed. not aborting!
    ERROR Dec 07 01:20:30 [u2d_dr_installer:3104]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:20:30 [u2d_dr_installer:3104]: action with nofail failed
    ERROR Dec 07 01:20:30 [u2d_pt_installer:3102]: nvram_get(is_eula): failed with -12
    WARNING Dec 07 01:20:30 [u2d_pt_installer:3102]: action with nofail failed
    MESSAGE Dec 07 01:21:11 [ips:3131]: do_stop(): status = EXITING

    Thanks,

    Jared

Children
  • Hi guys,

    how much do you have on your boxes? I am a home user with 8gb on my XG and not see these type of issues, though I do have some IPS features disabled because they block/throttle download traffic.

     

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • Hey rfcat,

    I have 8gb of RAM, quad-core (I think), and SSD.  When I get home I can give you the exact hardware specs.  Do the logs make you think this is a RAM issue?

    Sorry to hear about your IPS.  Did you have to disable the service, or just not enable it on your firewall rules?  I bought an SSD specifically so that it could do IPS quickly.

    I noticed that when I turned off the IPS service, the hardware utilization dropped tremendously and the continual spiking stopped.  CPU load averages around .3 or .4.  Presumably the spiking is due to the IPS service constantly restarting itself.  

    Wish someone from Sophos would help us out.  I work in IT and I've had Sophos try to sell me on this product for enterprise environments.  I want to like it because it is available for home use, and that I can train myself on it.  More companies should do that.

    Thanks,

    Jared

  • I have 3GB of RAM only, and a dual core cpu. v15 ran perfectly, v16 introduced this issue. I also get the spiking of CPU and RAM usage if I try start the service, and it "flaps" up and down (started and stopped) constantly. If I disable the service (which is how I run it at the moment), I have a perfectly smooth and happy experience. I considered the hardware being a bottleneck, but it ran V15 perfectly, so I can't see that being a problem. I also only have 2 users and at most 10 devices on the network, connected to a 10mbps DSL connection, so it's not like I'm pounding many users through it. 

    I've fiddled with it and I just can't get to the bottom of it. As I've said, my production CR300ing XP at work handles SFOS like a champ, no issues at all. 


  • Darrian,

    I have at home a box with 4 GB of RAM and 4 cores and IPS is working with no issue. IPS is applied to LAN to WAN traffic and between VLANs (the devices on the other side are not always turned on).

    XG however should run even with 2 GB of RAM because UTM 100/110/120 have only 2 GB, so something is not working on snort. :-(

    Waiting for a reply from or that should investigate on this and collect logs even from home users because there are several threads where snort is not working as expected in terms of performance.

     

  • Hi Luk,

    you cannot buy a new UTM 120 and from many entries in the utm forums they don't perform very well and need to be upgraded to 4gb which they physically can't. With 2gb and any amount of configured features on a reasonable speed link the UTM 120 will swap like mad affect performance.

    Minimum recommended for UTM is 4gb, disk speed doesn't have much bearing except if your box starts swapping then it is time to increase memory. On the UTM forum there are many threads on performance and box sizing. Search for some of Williams old threads.

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • Thank you Rfcat for your reply.

    XG runs even on XG85 and XG105 appliances which both have only 2 GB of RAM. As you said, it can be the ram frequency speed but however the amount of RAM taken by the IPS engine at the moment brings down even bigger appliances (some users have problem with SG310 clustered where IPS is taking 100% of CPU). So it can be a bug or they still have to improve the code to manage properly IPS engine.

    Unfortunately Snort 3.0 is still in Alpha test (multi-thread support). XG has a better management of IPS (because it can be enabled only between vlan, network objects) while UTM does not handle IPS specifically but UTM runs on UTM120 with no issue even with other features enabled.

    I know that UTM120 is not on the official HW eligible list but it is better than XG85.

    https://community.sophos.com/kb/en-us/122869

    I would like to try and see how XG runs on XG85 with IPS enabled.

  • Hi Jared,

    Any HA deployment in the scenario? Such issue is seen with appliances deployed in HA there is a fix to this but that comes from support which needs a back-end tweak. We are improving on the IPS section in our upcoming release meanwhile, if appliances are in HA, can you rebuild HA and let us know if that fixes the issue.

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Hi Sachingurung, 

    I cant speak for Jared, but my home config is definitely not configured for HA. 

    I assume best course for us will be to wait for the next release again and see if the issue is resolved, as I know I have for the last 2 releases. 

    Thank you,

    Darrian

  • Sachingurung,

    No, just a simple stand-alone home firewall, freshly installed last week with little config on it, except what was needed to get to the internet.  I re-installed the firewall thinking that the upgrade between 15 and 16 messed something up.  However, upon booting up from the install, I still had the same problem.

    Another release?  How often are updates released?

    Thanks,

    Jared

  • users here are saying that IPS was working fine on v15 and not well anymore on v16.

    I think that you should check if under some circunstances, the IPS is consuming all the RAM resouces.

    In my case IPS is working fine at home and on some customers (for example XG 210), but on some sytems here it is causing a problem.

    You could contact users here and get logs to improve the code or compatibility with other system that are not XG appliances.

    Thanks.