Slow Throughput after installing v18 EAP

Hi,

I upgraded from v17.5.8 to v18 EAP about a week ago and noticed a drop in performance and an increased RAM usage.

I do have a XG115 rev2 Appliance installed with the Software  Image and a Home Use License.

My Internet connection is 100/40.

With version 17.5.8 I was able to reach about 80 to 90 Mbit download (I already expected more from the hardware)

After the Upgrade I only reach about 50 to 60 Mbit download. There is no DPI or webfiltering activated and it doesn't matter if i activate IPS or not.

SSL/TLS Inspection is turned on but there are not any rules.

Are there any tweaking options for the software version of Sophos XG running on a HW Appliance?

Thank you!

Parents
  • As posted in the initial Announcement: 

    https://community.sophos.com/products/xg-firewall/sfos-eap/sfos-v18-early-access-program/b/blog/posts/sophos-xg-firewall-v18-fire-eap-firmware-is-here

    • The firmware has yet to be tuned for performance. Expect to see faster speeds in future builds.

     

    Do you use a hardware Bridge? 

    Do you use IPS?

    Do you use SSLx (even one rule with "Do not Decrypt")? 

    __________________________________________________________________________________________________________________

  • Hello,

    Is there any news on this question?

    I’ve been using v18 EAP 1 since launch, and the performance difference between v17.5.8 and v18 is wierd. The v18 has supposed to be faster, but it’s slower.

     

    I’m currently with Intel J1900 + 8GB ram with Intel 82576 NIC.

    I’ve made a clean installation, and used IPS GeneralPolicy, ATP (Log and Drop), Default Policy for Web and no HTTPS Decrypt for the testing.

    v17.5.8, I would be able to get 260mbit/s which is my WAN download limit, while using less than 45% of CPU usage. With HTTPS Decrypt on, i still has able to get 260mbit/s.

    v18, i can barely get 120mbit/s, that’s without TLS/SSL Inspection or HTTPS Decrypt via Web Proxy. If i use HTTPS Decrypt via Web Proxy, i would get the same speeds on any HTML5 speedtest. With TLS/SSL Inspection the throughput would get even lower to 80mbit/s.

    Here’s how it looks like with top on v18. Snort is always using 100% of the CPU.

     

    Is there anything that i can do to archive better speeds. Or it’s an issue in my end?

     

    Thanks,


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  • Could you show us your Network configuration on this appliance? 

    __________________________________________________________________________________________________________________

  • What exactly Network configurations you want me to show, and what appliance, the VM I've made to test this, or the bare-metal?, Also what version?

    I've also redone the test I've made before in a better environment, the results are on the last post I've made.

     

    Thanks,


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  •  

    shared all the information to investigate. You question is not very helpful!

  • Very insightful, Thanks.

    I too noticed slowness with v18.

    I hoped this week-end I find time to revert back to v.17.5.8.

    Because it has become a "trap" trying to keep both version, I will not try to upgrade an inactive v18 firmware anymore.

    What frustrate me utterly, is that I have lost my v17.5.8 absolutely for nothing because v18 EAP refresh 1 is a correction for a single insignificant bug.

    Again, that waste of time could have been avoided with professional communications from Sophos.

    Paul Jr

  • This reply was deleted.

    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  • That is some extensive testing and it takes a lot of time to do tests like these. You shouldn't have to show sophos hard proof, they should already have these numbers. While the rest of us are going by simple feel of our internet connection and then perform simple speed tests, you are taking testing to the next level. BRAVO and well done!

    On a side note, I did revive my vm and turned off IPS completely and XG is fairly livable without IPS. Ofcourse none of the graphs work in application categories and all the other app detection is deactivated but for a simple web-filtering firewall/av, it is fairly snappy compared to old v17 I used about a year ago.

    Regards

  • For the single thread ...  I have read somewhere some new functions will not work on "imaginary" cores called Hyperthread.  Only on REAL cores.

    That may answer partially ...

    Paul Jr

  • Hi,

     

    I know this is a old thread, but I don't want to create a new one.

    Are we going to see any improvements on EAP 3? Or we will get new throughput numbers for the XG appliances when v18 comes out?

    In the meantime, I've upgraded my J1900 to a Intel G5400, I've been using the 82576 Intel NIC and also brough 2x 10Gb X520 Intel NIC for some testings. It's impressive the throughput you can get with XG, even on a limited Home license. Here's some fun I've had with v17.5.9:

    I'm impressed with the throughput you can get, this is with the 10Gb NIC, using librespeed for HTML speed tests, using Web Proxy on v17.5.9. Iperf3 reached line-rate - Without IPS, With IPS, 3.2 Gbit/s.

    One thing I'm impressed is XG is much faster than other competitors, I've tested Checkpoint (Same HW, Using Open-Server). While on XG I has able to push line-rate (WIthout IPS, With IPS, the limit has 3.2Gbit/s.) on Speedtests, and 5-6Gbit/s on speedtests with Web Proxy, using Checkpoint R80.30, i could barely push 280Mbit/s over a single connection, my CPU has crying to even push gigabit on it.

    Those tests are just to know how much throughput I could push with XG,

     

    Now here's on Real-World Traffic, sadly my WAN connection is limited to 250Mbit/s.

    Using v17.5.9: We first see the peak, that happens when XG is booting, and then the CPU goes to 10%< Usage. Right after it booted I've started some throughput tests.

    Here's the Rule options I've used:

     

    Now going back to v18 EAP 2...

    Same Hardware, same NIC, but in this time I'm using v18. On the Graph you can see the CPU being almost fully utilized (18:05), that's when I started the tests.

     

    I hope the reason we are getting this throughput because we're on EAP, and have nothing to do with the real throughput we will get on GA. Well, I'm not a Dev, but I expect it's because all debug code on it or something else.

    At the end, If I any Sophos Dev is sure I'm making mistakes on v18, and my throughput shouldn't be like this, I'll love to know the answer.

     

    I'll be waiting for EAP 3.

     

    Thanks,


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  • The quick answer is that we know there are throughput issues.  It is not about bugs, or about debug code, it is about tuning.  I know that there are also differences in hardware, and I overhear conversations about how different appliances are behaving under different tuning.

    I cannot say anything specific about EAP3 or any other release.  I know there are a bunch of changes, I don't know when they'll make it to public.  I would expect that the numbers will be significantly better by the time we get to GA.

    While there is nothing stopping people from doing performance testing right now, anything measurements people take are not reflective of the real world.  I don't believe any measurements people take will be used by the Devs doing the tuning.

  • I would strongly disagree with your assessment. Throughput numbers should be the top priority for sophos. Bandwidth is getting cheaper and the requirements are increasing significantly. Its great that behind the scenes you guys know that the performance is not what it should be. What is concerning is that these performance levels would be expected during an alpha stage but not during the second public beta (one release away from GA?). Nobody is publicly acknowledging the problems other than you(thanks for that by the way) and a few comments about it will only get better from here. They did fix the load averages after EAP1 and that is great however the throughput numbers are still the same as EAP1. We are testing our systems mostly with very light loads and still bringing them down to their knees. With regular loads of 100s of employees, the system needs more than minor tuning at this point.

    Part of the reason that there are so many threads on performance is due to the fact that everyone has a decent pipe to the internet and most of us utilize that bandwidth even if its just for watching cat videos on youtube. So to have a beta and then completely ignore the throughput numbers and then say that behind the scenes some people are aware and someone is going to tune is not very reassuring.

    All the bells and whistles are worthless if I can't saturate my WAN pipe easily even with a modest recent processor.

    Thanks again for your comments and this post is in no way directed at you personally.

    Regards

    Bill

Reply
  • I would strongly disagree with your assessment. Throughput numbers should be the top priority for sophos. Bandwidth is getting cheaper and the requirements are increasing significantly. Its great that behind the scenes you guys know that the performance is not what it should be. What is concerning is that these performance levels would be expected during an alpha stage but not during the second public beta (one release away from GA?). Nobody is publicly acknowledging the problems other than you(thanks for that by the way) and a few comments about it will only get better from here. They did fix the load averages after EAP1 and that is great however the throughput numbers are still the same as EAP1. We are testing our systems mostly with very light loads and still bringing them down to their knees. With regular loads of 100s of employees, the system needs more than minor tuning at this point.

    Part of the reason that there are so many threads on performance is due to the fact that everyone has a decent pipe to the internet and most of us utilize that bandwidth even if its just for watching cat videos on youtube. So to have a beta and then completely ignore the throughput numbers and then say that behind the scenes some people are aware and someone is going to tune is not very reassuring.

    All the bells and whistles are worthless if I can't saturate my WAN pipe easily even with a modest recent processor.

    Thanks again for your comments and this post is in no way directed at you personally.

    Regards

    Bill

Children
  • He clearly stated that he expects the performance issues to get worked out before GA. Why do people always feel so entitled around about beta releases? They are beta for a reason. You're making baseless assumptions using a crystal ball and then portray them as a "concerning" reality. Go, watch some more cat videos on Youtube and relax.

    [Moderated to remove profanity]

  • cryptochrome said:
     Go, watch some more cat videos on Youtube and relax.  

    I am trying to relax but unfortunately my cpu isn't as chill as I am. Besides, its a public beta and the only time I get entitlement to tell sophos about how I feel. If you work for sophos then sorry, if not then why so defensive?

    Regards

  • Billybob said:

    I would strongly disagree with your assessment. Throughput numbers should be the top priority for sophos.

    Throughput numbers are a top priority.

    My point is that we have internal testbeds with 1GB and 10GB pipes hooked to a performance harness pushing through traffic on a dozen different hardware boxes of different sizes and producing our own detailed analysis with a per process breakdown that our team is acting on.

    Getting a report from the forum saying that on their own custom hardware a CPU graph looks bad isn't something that is a useful input to our performance team. It can be useful for different people in the community to talk about, but it is not a report that our developers are going to be using. It can also managers prioritize performance over other existing issues and make sure that they have a minimum threshold before we release.

    Don't stop reporting performance issues.  But realize that all the threads on performance go to the forum readers like myself and do not go to the developers. They are using their own test infrastructure to find the same thing but in much actionable detail.

    I agree that we should have good throughput numbers in the beta before GA. The problem is like one of cooking a meal and the turkey still needs more time in the oven while the rest of the meal is ready. We decided to go to EAP because we had features ready that we wanted people to try even if the main course was still not ready.

     

     

    Part of the reason that there are so many threads on performance is due to the fact that everyone has a decent pipe to the internet and most of us utilize that bandwidth even if its just for watching cat videos on youtube. So to have a beta and then completely ignore the throughput numbers and then say that behind the scenes some people are aware and someone is going to tune is not very reassuring.

    There is a balance of never replying and appearing to ignore the issue, replying with enough details to let people know we are on it, or replying with huge amounts of data that give too much away about our development process.  Most employees are cautious about replying or don't feel they know the enough of details.  But they haven't locked my account yet so I'm still here to give away secrets.  :)  

    I'm trying to be reassuring but more than that I want to set an expectation: When EAP3 comes up the performance numbers will not be good enough to go to GA.

    There are significant performance code changes that are not merged into EAP3.  The builds that we are currently testing internally are significantly faster than the EAP3 that you will be testing.  The performance tests you are running on EAP2 is code that is over a month old.  Even when EAP3 comes out the performance tests you will be running are effectively going to be on old code.  There are reasons that the new performance code is not yet integrated, we are waiting for a few more changes and we want to run it internal dogfoods before it is goes out to you guys.  The schedule is not what we want it to be.   Either we delay all of EAP3 waiting for the performance improvements, or we release EAP3 now and do another release later.  I know this isn't anymore more than "trust us" but we are aware of the issue.

    Thanks again for your comments and this post is in no way directed at you personally.

    Your mother was a hamster and your father smelled of elderberries.  :)
     
    Truthfully though, I value the people who are active in the forum more than the people who are silent.  I want to thank you for being one of the active, polite people.
     
     
  • Because these constant complaints about a beta version are irrelevant and don't help anyone. I get tired of people who use Sophos for their home networks and complain about BS like Youtube cat videos or their shiny 400 mbit internet uplinks not being saturated. Use the stable version if you are concerned about throughput. You simply cannot expect a beta to have production quality. And thus, your complaining is just mute. Any annoying. 

  • Hi Michael,

     

    I'm very grateful for your answer, this is the exact kind of answer I've been waiting from a Sophos employee, don't get me wrong, but if we had this exact answer when EAP 1 came out, there would be no thread discussing about v18 performance.

    Michael Dunn said:
    There are significant performance code changes that are not merged into EAP3.  The builds that we are currently testing internally are significantly faster than the EAP3 that you will be testing.  The performance tests you are running on EAP2 is code that is over a month old.  Even when EAP3 comes out the performance tests you will be running are effectively going to be on old code.  There are reasons that the new performance code is not yet integrated, we are waiting for a few more changes and we want to run it internal dogfoods before it is goes out to you guys.  The schedule is not what we want it to be.   Either we delay all of EAP3 waiting for the performance improvements, or we release EAP3 now and do another release later.  I know this isn't anymore more than "trust us" but we are aware of the issue.

    Thank you again, this answer is perfect.

     

    I'll be waiting for future releases.


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  • Thanks again for your concise and helpful response. I always enjoy your engagement with the community and the glimpse you offer on the inner workings and the behind the scenes thinking at sophos headquarters.

    Michael Dunn said:

    I agree that we should have good throughput numbers in the beta before GA. 

    Thanks for this. We shouldn't have to wait till GA, nor should sophos wait till the GA to guage how their products are behaving in the field. The reason I was mostly concerned was because there was some regression in the throughput even without depolying the DPI engine in some tests. At this point in the age of XG, some baseline performance is expected even during the beta phase so when the throughput dropped significantly without any clear explanations, it was concerning to some of us. 

    Thanks again for taking the time as I always appreciate your detailed answers. Also glad that you didn't take my post as a dig at you or sophos which lots of people perceive them to be. We don't use sophos products at work any longer and I don't use sophos at home other than UTM for protecting my mail/outlook server in the lab. However it is always christmas time when new software is released and my criticism originates from my long appreciation for astaro and then for sophos and the higher standards that you and I both expect from them.

    Regards

    Bill

  • Please review your language it is totally inappropriate and not called for.

    Ian

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • : we need more inputs from Sophos Staff like does. Community really enjoys this kind of "active participation" instead of SILENCE.

    I know Michael Dunn since many years on the forum and all the time, I stop reading his responses as they are more informative than several official Sophos KB.

    We would like to see at least one active dev for each XG's Unit on the community. Hope you understand our point, Flo.

    Thanks Michael. We see the passion you have in your job.

  • How can we help debug the issue ? Running gigabit connection on comcast and all my speed test shows is  60mbs I know my comcast can be saturated but I would expect higher speeds. My cpu usage is 7%

  • Hi Jack,

    please try again after disabling or removing IPS from your firewall rule.

    Ian

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.