Slow Throughput after installing v18 EAP

Hi,

I upgraded from v17.5.8 to v18 EAP about a week ago and noticed a drop in performance and an increased RAM usage.

I do have a XG115 rev2 Appliance installed with the Software  Image and a Home Use License.

My Internet connection is 100/40.

With version 17.5.8 I was able to reach about 80 to 90 Mbit download (I already expected more from the hardware)

After the Upgrade I only reach about 50 to 60 Mbit download. There is no DPI or webfiltering activated and it doesn't matter if i activate IPS or not.

SSL/TLS Inspection is turned on but there are not any rules.

Are there any tweaking options for the software version of Sophos XG running on a HW Appliance?

Thank you!

Parents
  • As posted in the initial Announcement: 

    https://community.sophos.com/products/xg-firewall/sfos-eap/sfos-v18-early-access-program/b/blog/posts/sophos-xg-firewall-v18-fire-eap-firmware-is-here

    • The firmware has yet to be tuned for performance. Expect to see faster speeds in future builds.

     

    Do you use a hardware Bridge? 

    Do you use IPS?

    Do you use SSLx (even one rule with "Do not Decrypt")? 

    __________________________________________________________________________________________________________________

  • Hello,

    Is there any news on this question?

    I’ve been using v18 EAP 1 since launch, and the performance difference between v17.5.8 and v18 is wierd. The v18 has supposed to be faster, but it’s slower.

     

    I’m currently with Intel J1900 + 8GB ram with Intel 82576 NIC.

    I’ve made a clean installation, and used IPS GeneralPolicy, ATP (Log and Drop), Default Policy for Web and no HTTPS Decrypt for the testing.

    v17.5.8, I would be able to get 260mbit/s which is my WAN download limit, while using less than 45% of CPU usage. With HTTPS Decrypt on, i still has able to get 260mbit/s.

    v18, i can barely get 120mbit/s, that’s without TLS/SSL Inspection or HTTPS Decrypt via Web Proxy. If i use HTTPS Decrypt via Web Proxy, i would get the same speeds on any HTML5 speedtest. With TLS/SSL Inspection the throughput would get even lower to 80mbit/s.

    Here’s how it looks like with top on v18. Snort is always using 100% of the CPU.

     

    Is there anything that i can do to archive better speeds. Or it’s an issue in my end?

     

    Thanks,


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  • That is some extensive testing and it takes a lot of time to do tests like these. You shouldn't have to show sophos hard proof, they should already have these numbers. While the rest of us are going by simple feel of our internet connection and then perform simple speed tests, you are taking testing to the next level. BRAVO and well done!

    On a side note, I did revive my vm and turned off IPS completely and XG is fairly livable without IPS. Ofcourse none of the graphs work in application categories and all the other app detection is deactivated but for a simple web-filtering firewall/av, it is fairly snappy compared to old v17 I used about a year ago.

    Regards

  • For the single thread ...  I have read somewhere some new functions will not work on "imaginary" cores called Hyperthread.  Only on REAL cores.

    That may answer partially ...

    Paul Jr

  • Hi,

     

    I know this is a old thread, but I don't want to create a new one.

    Are we going to see any improvements on EAP 3? Or we will get new throughput numbers for the XG appliances when v18 comes out?

    In the meantime, I've upgraded my J1900 to a Intel G5400, I've been using the 82576 Intel NIC and also brough 2x 10Gb X520 Intel NIC for some testings. It's impressive the throughput you can get with XG, even on a limited Home license. Here's some fun I've had with v17.5.9:

    I'm impressed with the throughput you can get, this is with the 10Gb NIC, using librespeed for HTML speed tests, using Web Proxy on v17.5.9. Iperf3 reached line-rate - Without IPS, With IPS, 3.2 Gbit/s.

    One thing I'm impressed is XG is much faster than other competitors, I've tested Checkpoint (Same HW, Using Open-Server). While on XG I has able to push line-rate (WIthout IPS, With IPS, the limit has 3.2Gbit/s.) on Speedtests, and 5-6Gbit/s on speedtests with Web Proxy, using Checkpoint R80.30, i could barely push 280Mbit/s over a single connection, my CPU has crying to even push gigabit on it.

    Those tests are just to know how much throughput I could push with XG,

     

    Now here's on Real-World Traffic, sadly my WAN connection is limited to 250Mbit/s.

    Using v17.5.9: We first see the peak, that happens when XG is booting, and then the CPU goes to 10%< Usage. Right after it booted I've started some throughput tests.

    Here's the Rule options I've used:

     

    Now going back to v18 EAP 2...

    Same Hardware, same NIC, but in this time I'm using v18. On the Graph you can see the CPU being almost fully utilized (18:05), that's when I started the tests.

     

    I hope the reason we are getting this throughput because we're on EAP, and have nothing to do with the real throughput we will get on GA. Well, I'm not a Dev, but I expect it's because all debug code on it or something else.

    At the end, If I any Sophos Dev is sure I'm making mistakes on v18, and my throughput shouldn't be like this, I'll love to know the answer.

     

    I'll be waiting for EAP 3.

     

    Thanks,


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  • The quick answer is that we know there are throughput issues.  It is not about bugs, or about debug code, it is about tuning.  I know that there are also differences in hardware, and I overhear conversations about how different appliances are behaving under different tuning.

    I cannot say anything specific about EAP3 or any other release.  I know there are a bunch of changes, I don't know when they'll make it to public.  I would expect that the numbers will be significantly better by the time we get to GA.

    While there is nothing stopping people from doing performance testing right now, anything measurements people take are not reflective of the real world.  I don't believe any measurements people take will be used by the Devs doing the tuning.

  • I would strongly disagree with your assessment. Throughput numbers should be the top priority for sophos. Bandwidth is getting cheaper and the requirements are increasing significantly. Its great that behind the scenes you guys know that the performance is not what it should be. What is concerning is that these performance levels would be expected during an alpha stage but not during the second public beta (one release away from GA?). Nobody is publicly acknowledging the problems other than you(thanks for that by the way) and a few comments about it will only get better from here. They did fix the load averages after EAP1 and that is great however the throughput numbers are still the same as EAP1. We are testing our systems mostly with very light loads and still bringing them down to their knees. With regular loads of 100s of employees, the system needs more than minor tuning at this point.

    Part of the reason that there are so many threads on performance is due to the fact that everyone has a decent pipe to the internet and most of us utilize that bandwidth even if its just for watching cat videos on youtube. So to have a beta and then completely ignore the throughput numbers and then say that behind the scenes some people are aware and someone is going to tune is not very reassuring.

    All the bells and whistles are worthless if I can't saturate my WAN pipe easily even with a modest recent processor.

    Thanks again for your comments and this post is in no way directed at you personally.

    Regards

    Bill

  • He clearly stated that he expects the performance issues to get worked out before GA. Why do people always feel so entitled around about beta releases? They are beta for a reason. You're making baseless assumptions using a crystal ball and then portray them as a "concerning" reality. Go, watch some more cat videos on Youtube and relax.

    [Moderated to remove profanity]

  • cryptochrome said:
     Go, watch some more cat videos on Youtube and relax.  

    I am trying to relax but unfortunately my cpu isn't as chill as I am. Besides, its a public beta and the only time I get entitlement to tell sophos about how I feel. If you work for sophos then sorry, if not then why so defensive?

    Regards

  • Billybob said:

    I would strongly disagree with your assessment. Throughput numbers should be the top priority for sophos.

    Throughput numbers are a top priority.

    My point is that we have internal testbeds with 1GB and 10GB pipes hooked to a performance harness pushing through traffic on a dozen different hardware boxes of different sizes and producing our own detailed analysis with a per process breakdown that our team is acting on.

    Getting a report from the forum saying that on their own custom hardware a CPU graph looks bad isn't something that is a useful input to our performance team. It can be useful for different people in the community to talk about, but it is not a report that our developers are going to be using. It can also managers prioritize performance over other existing issues and make sure that they have a minimum threshold before we release.

    Don't stop reporting performance issues.  But realize that all the threads on performance go to the forum readers like myself and do not go to the developers. They are using their own test infrastructure to find the same thing but in much actionable detail.

    I agree that we should have good throughput numbers in the beta before GA. The problem is like one of cooking a meal and the turkey still needs more time in the oven while the rest of the meal is ready. We decided to go to EAP because we had features ready that we wanted people to try even if the main course was still not ready.

     

     

    Part of the reason that there are so many threads on performance is due to the fact that everyone has a decent pipe to the internet and most of us utilize that bandwidth even if its just for watching cat videos on youtube. So to have a beta and then completely ignore the throughput numbers and then say that behind the scenes some people are aware and someone is going to tune is not very reassuring.

    There is a balance of never replying and appearing to ignore the issue, replying with enough details to let people know we are on it, or replying with huge amounts of data that give too much away about our development process.  Most employees are cautious about replying or don't feel they know the enough of details.  But they haven't locked my account yet so I'm still here to give away secrets.  :)  

    I'm trying to be reassuring but more than that I want to set an expectation: When EAP3 comes up the performance numbers will not be good enough to go to GA.

    There are significant performance code changes that are not merged into EAP3.  The builds that we are currently testing internally are significantly faster than the EAP3 that you will be testing.  The performance tests you are running on EAP2 is code that is over a month old.  Even when EAP3 comes out the performance tests you will be running are effectively going to be on old code.  There are reasons that the new performance code is not yet integrated, we are waiting for a few more changes and we want to run it internal dogfoods before it is goes out to you guys.  The schedule is not what we want it to be.   Either we delay all of EAP3 waiting for the performance improvements, or we release EAP3 now and do another release later.  I know this isn't anymore more than "trust us" but we are aware of the issue.

    Thanks again for your comments and this post is in no way directed at you personally.

    Your mother was a hamster and your father smelled of elderberries.  :)
     
    Truthfully though, I value the people who are active in the forum more than the people who are silent.  I want to thank you for being one of the active, polite people.
     
     
  • Because these constant complaints about a beta version are irrelevant and don't help anyone. I get tired of people who use Sophos for their home networks and complain about BS like Youtube cat videos or their shiny 400 mbit internet uplinks not being saturated. Use the stable version if you are concerned about throughput. You simply cannot expect a beta to have production quality. And thus, your complaining is just mute. Any annoying. 

  • Hi Michael,

     

    I'm very grateful for your answer, this is the exact kind of answer I've been waiting from a Sophos employee, don't get me wrong, but if we had this exact answer when EAP 1 came out, there would be no thread discussing about v18 performance.

    Michael Dunn said:
    There are significant performance code changes that are not merged into EAP3.  The builds that we are currently testing internally are significantly faster than the EAP3 that you will be testing.  The performance tests you are running on EAP2 is code that is over a month old.  Even when EAP3 comes out the performance tests you will be running are effectively going to be on old code.  There are reasons that the new performance code is not yet integrated, we are waiting for a few more changes and we want to run it internal dogfoods before it is goes out to you guys.  The schedule is not what we want it to be.   Either we delay all of EAP3 waiting for the performance improvements, or we release EAP3 now and do another release later.  I know this isn't anymore more than "trust us" but we are aware of the issue.

    Thank you again, this answer is perfect.

     

    I'll be waiting for future releases.


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

Reply
  • Hi Michael,

     

    I'm very grateful for your answer, this is the exact kind of answer I've been waiting from a Sophos employee, don't get me wrong, but if we had this exact answer when EAP 1 came out, there would be no thread discussing about v18 performance.

    Michael Dunn said:
    There are significant performance code changes that are not merged into EAP3.  The builds that we are currently testing internally are significantly faster than the EAP3 that you will be testing.  The performance tests you are running on EAP2 is code that is over a month old.  Even when EAP3 comes out the performance tests you will be running are effectively going to be on old code.  There are reasons that the new performance code is not yet integrated, we are waiting for a few more changes and we want to run it internal dogfoods before it is goes out to you guys.  The schedule is not what we want it to be.   Either we delay all of EAP3 waiting for the performance improvements, or we release EAP3 now and do another release later.  I know this isn't anymore more than "trust us" but we are aware of the issue.

    Thank you again, this answer is perfect.

     

    I'll be waiting for future releases.


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

Children
No Data