Slow Throughput after installing v18 EAP

Hi,

I upgraded from v17.5.8 to v18 EAP about a week ago and noticed a drop in performance and an increased RAM usage.

I do have a XG115 rev2 Appliance installed with the Software  Image and a Home Use License.

My Internet connection is 100/40.

With version 17.5.8 I was able to reach about 80 to 90 Mbit download (I already expected more from the hardware)

After the Upgrade I only reach about 50 to 60 Mbit download. There is no DPI or webfiltering activated and it doesn't matter if i activate IPS or not.

SSL/TLS Inspection is turned on but there are not any rules.

Are there any tweaking options for the software version of Sophos XG running on a HW Appliance?

Thank you!

  • In reply to SaschaParis:

    Hi, Thank you so much for this answer.

     

    First, if I'm wrong then please correct me.

    I also understand, as the name says, It's an Early Access Program, It can have bugs and performance issues.

     

    But, The problem I've encountered it's a bit different, I'm not using any IPS/Web Proxy/SSL/TLS Decrypt and somehow snort is using 100% of all my 4 cores.

    To start with it:

    I've created an Rule which allows to pass the Traffic between LAN to LAN, and on this rule I've used .

    Also there's no TLS/SSL Inspection rules being applied on it.

     

    Testing with Iperf3, From 10.0.0.200 => 10.0.1.11 (VLAN 20), I've been getting the maximum throughput of 430Mbit/s with "iperf3 -c 10.0.1.11 -P 5" While Snort is using 100% of all my cores.

     

    My question is, what is Snort doing? I've disabled all features on the Rule, I've checked to see if there's any other rule influencing on it, also there's no SSL/TSL Inspection being used on it, and still my throughput is limited by it.

     

    The problem is, on v17.5.8 I've used to get line-rate throughput with this test.

     

    Thanks,

  • In reply to Prism:

    What is the output of

    system application_classification show

    in your device console? If it's on, you might test setting it to off using same command but replace show by off...

    Not sure about the implication of disabling global app classification in V18, as this command bypasses snort, if no explicit IPS or application control rule is configured in the matching firewall rule. This was a good workaround to get linespeeds up to 17.5.  However - I didn't test if this also bypasses the new fast path offloading capability of V18 (will test if I find some time), but as FastPath offloading in my eyes seems not to behave as intended at this specific release, this might help temporarily speed up things until Fast Path behaves as expected.

    As you mentioned - it's early access..there is time until GA to bring things in shape, so I'm not too worried about in that early stage.

  • In reply to SaschaParis:

    That is not fair sascha. You are asking him to disable application classification which breaks other things such as qos etc while he is clearly stating that he didn't have such problems with v17.5xx 

    Probably turning off the IPS services will do the same thing as turning off classification and it will also break your dashboard that tells you the classification of different websites and other reports. Might as well run a simple iptables router at that point. Wire speed achieved.

    Regards

    Edit: I see you edited your post considerably. We do agree that it is early so things can only improve from here hopefully. But I think over reliance on snort on a UTM type device is never a good thing since what they are asking snort to do usually needs dedicated appliances due to heavy cpu/ram requirements.

  • In reply to SaschaParis:

    Hi,

    It showed as ON, after turning off i has able to archive full LAN gigabit on it, but...

     


    The test I've made isn’t fully accurate since I don't have the knowledge nor the equipment to do it correctly.

    But it gives an perspective on the performance difference.

     

    Since I think there’s something wrong with v18 performance, I’ve decided to create two VM, one with v17.5.8, and another with v18 EAP 1 Refresh 1.

    - Both VM’s had 4 Cores and 8GB RAM (6GB Usable)

    - KVM, with virt-manager has used.

    - Host OS: CentOS.

    - Host: Ryzen 1700 / 32GB RAM.

    - Fedora 31 as the LAN VM, for the testing. With 4 Cores 8GB RAM.

    - VirtiO Driver has been used on all VM’s.

     

    An outlook on how it has been run:

    HOST - WAN - XG - Isolated/LAN – VM/FEDORA

     

    ---

    Edit: Redone some tests in a better environment, also added pictures. Also used Iperf3.

    ---

     

    v18 EAP 1 Refresh 1,

     

    IPS – GeneralPolicy – SingleThread: 320 Mbit/s

    IPS – GeneralPolicy – MultiThread: 1.28 Gbit/s

     

    ----------

    v17.5.8,

     

    IPS – GeneralPolicy – SingleThread: 926 Mbit/s

    IPS – GeneralPolicy – MultiThread: 2.78 Gbit/s

     

    I’m impressed on how much IPS single core speed has been penalized from v17.5.8 to v18, I’ve tested multiple times, but still the difference is too high.

     

    And as I said before this is just an simple testing, to give an perspective on the performance difference.

     

    Thanks,

  • In reply to Prism:

    Could you show us your Network configuration on this appliance? 

  • In reply to LuCar Toni:

    What exactly Network configurations you want me to show, and what appliance, the VM I've made to test this, or the bare-metal?, Also what version?

    I've also redone the test I've made before in a better environment, the results are on the last post I've made.

     

    Thanks,

  • In reply to LuCar Toni:

     

    shared all the information to investigate. You question is not very helpful!

  • In reply to Prism:

    Very insightful, Thanks.

    I too noticed slowness with v18.

    I hoped this week-end I find time to revert back to v.17.5.8.

    Because it has become a "trap" trying to keep both version, I will not try to upgrade an inactive v18 firmware anymore.

    What frustrate me utterly, is that I have lost my v17.5.8 absolutely for nothing because v18 EAP refresh 1 is a correction for a single insignificant bug.

    Again, that waste of time could have been avoided with professional communications from Sophos.

    Paul Jr

  • In reply to Prism:

    [deleted]
  • In reply to Prism:

    That is some extensive testing and it takes a lot of time to do tests like these. You shouldn't have to show sophos hard proof, they should already have these numbers. While the rest of us are going by simple feel of our internet connection and then perform simple speed tests, you are taking testing to the next level. BRAVO and well done!

    On a side note, I did revive my vm and turned off IPS completely and XG is fairly livable without IPS. Ofcourse none of the graphs work in application categories and all the other app detection is deactivated but for a simple web-filtering firewall/av, it is fairly snappy compared to old v17 I used about a year ago.

    Regards

  • In reply to Prism:

    For the single thread ...  I have read somewhere some new functions will not work on "imaginary" cores called Hyperthread.  Only on REAL cores.

    That may answer partially ...

    Paul Jr

  • In reply to Prism:

    Hi,

     

    I know this is a old thread, but I don't want to create a new one.

    Are we going to see any improvements on EAP 3? Or we will get new throughput numbers for the XG appliances when v18 comes out?

    In the meantime, I've upgraded my J1900 to a Intel G5400, I've been using the 82576 Intel NIC and also brough 2x 10Gb X520 Intel NIC for some testings. It's impressive the throughput you can get with XG, even on a limited Home license. Here's some fun I've had with v17.5.9:

    I'm impressed with the throughput you can get, this is with the 10Gb NIC, using librespeed for HTML speed tests, using Web Proxy on v17.5.9. Iperf3 reached line-rate - Without IPS, With IPS, 3.2 Gbit/s.

    One thing I'm impressed is XG is much faster than other competitors, I've tested Checkpoint (Same HW, Using Open-Server). While on XG I has able to push line-rate (WIthout IPS, With IPS, the limit has 3.2Gbit/s.) on Speedtests, and 5-6Gbit/s on speedtests with Web Proxy, using Checkpoint R80.30, i could barely push 280Mbit/s over a single connection, my CPU has crying to even push gigabit on it.

    Those tests are just to know how much throughput I could push with XG,

     

    Now here's on Real-World Traffic, sadly my WAN connection is limited to 250Mbit/s.

    Using v17.5.9: We first see the peak, that happens when XG is booting, and then the CPU goes to 10%< Usage. Right after it booted I've started some throughput tests.

    Here's the Rule options I've used:

     

    Now going back to v18 EAP 2...

    Same Hardware, same NIC, but in this time I'm using v18. On the Graph you can see the CPU being almost fully utilized (18:05), that's when I started the tests.

     

    I hope the reason we are getting this throughput because we're on EAP, and have nothing to do with the real throughput we will get on GA. Well, I'm not a Dev, but I expect it's because all debug code on it or something else.

    At the end, If I any Sophos Dev is sure I'm making mistakes on v18, and my throughput shouldn't be like this, I'll love to know the answer.

     

    I'll be waiting for EAP 3.

     

    Thanks,

  • In reply to Prism:

    The quick answer is that we know there are throughput issues.  It is not about bugs, or about debug code, it is about tuning.  I know that there are also differences in hardware, and I overhear conversations about how different appliances are behaving under different tuning.

    I cannot say anything specific about EAP3 or any other release.  I know there are a bunch of changes, I don't know when they'll make it to public.  I would expect that the numbers will be significantly better by the time we get to GA.

    While there is nothing stopping people from doing performance testing right now, anything measurements people take are not reflective of the real world.  I don't believe any measurements people take will be used by the Devs doing the tuning.

  • In reply to Michael Dunn:

    I would strongly disagree with your assessment. Throughput numbers should be the top priority for sophos. Bandwidth is getting cheaper and the requirements are increasing significantly. Its great that behind the scenes you guys know that the performance is not what it should be. What is concerning is that these performance levels would be expected during an alpha stage but not during the second public beta (one release away from GA?). Nobody is publicly acknowledging the problems other than you(thanks for that by the way) and a few comments about it will only get better from here. They did fix the load averages after EAP1 and that is great however the throughput numbers are still the same as EAP1. We are testing our systems mostly with very light loads and still bringing them down to their knees. With regular loads of 100s of employees, the system needs more than minor tuning at this point.

    Part of the reason that there are so many threads on performance is due to the fact that everyone has a decent pipe to the internet and most of us utilize that bandwidth even if its just for watching cat videos on youtube. So to have a beta and then completely ignore the throughput numbers and then say that behind the scenes some people are aware and someone is going to tune is not very reassuring.

    All the bells and whistles are worthless if I can't saturate my WAN pipe easily even with a modest recent processor.

    Thanks again for your comments and this post is in no way directed at you personally.

    Regards

    Bill

  • In reply to Billybob:

    He clearly stated that he expects the performance issues to get worked out before GA. Why do people always feel so entitled around about beta releases? They are beta for a reason. You're making baseless assumptions using a crystal ball and then portray them as a "concerning" reality. Go, watch some more cat videos on Youtube and relax.

    [Moderated to remove profanity]