Questions about the new DPI Engine.

First of all, I'm just a home user, so I feel like I shouldn't be complaining that much in here, or even making this post. ¯\_(ツ)_/¯

---

First Question:

 

On v18 It has introduced the brand new DPI Engine, which as said by , is a:

"Single high-performance streaming DPI engine with proxyless scanning of all traffic for AV, IPS, and web threats as well as providing Application Control and SSL Inspection."

The problem is in  - "Single high-performance streaming DPI engine with proxyless scanning of all traffic for AV", The AV is the problem.

 

(I'm not a professional, so If there's mistakes I'm sorry, also, please tell me if there is.)

While on v17.5, also on v18, If you use the Legacy Web Proxy, the service "avd" would be used as AV.

In my understatement, the new DPI Engine which uses Snort as It service, would also use it for AV, but, but while using the DPI Engine you can see the "avd" service being spawned, and used by XG for AV scanning.

This is not a issue, well I'm not a Sophos Dev, so in my opinion this is just weird, again, It's not a issue.

 

Playing on my Home setup, the main noticeable difference between using Web Proxy and the DPI Engine (While using HTTP(s) scan.) Is throughput, the DPI Engine is much faster than the Web Proxy.

Now the "issue"; It's CPS, both Web Proxy and "Xstream SSL Inspection" can handle the same amount of CPS with the AV, which in my setup is 750~ (Wrong, check Edits.). So the main "issue" here is the AV.

CPS = Connections per Second, here it's just HTTPS.

Edit: I've shouldn't have done this testing at midnight, I has way too sleepy for this, The numbers on the DPI Engine is correct, but on Web Proxy is slower than I wrote before, now It makes more sense.

Edit: Also, good job for the Sophos Devs, the difference between the Legacy Web Proxy and the new DPI Engine is impressive.

Edit 2: Some additional information about this, which makes the difference between Web Proxy and DPI Engine even more impressive, I've used TLS v1.3 with the DPI Engine, while on Web Proxy has TLS v1.2.

Edit 3: Web Proxy has using this Chiper/Auth combination: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, 256 bit Keys, TLS v1.2;

Edit 3: DPI Engine has using this Chiper/Auth combination: TLS_AES_256_GCM_SHA384, 256 bit Keys, TLS v1.3;

Edit 4: I will try to force TLS v1.2 on the DPI Engine to see if there's even a higher difference.

XG v18 EAP3 Refresh-1 Web Proxy TLS v1.2 DPI Engine TLS v1.2 DPI Engine TLS v1.3
CPS with/AV

275~

980~ 750~
CPS without/AV 430~ 4700~ 4100~
Latency >90%/Connections 0.070/Sec 0.014/Sec 0.101/Sec

Disabling HTTP(s) scanning, and there you go, more than 5x the CPS while using the DPI Engine. It feels like the current XG AV is holding the DPI Engine back, and not letting it to have the performance that It's capable of.

After all this *** writing, my question is, is this expected? Will XG keep using "avd" as the AV for HTTP(s), IMAP, and so on? I'm not saying It's wrong, or bad, again, It's just weird. Also if It is, then why - "Single high-performance streaming DPI engine with proxyless scanning of all traffic for AV", In my understatement of what has said on this, shouldn't Snort be also taking care of AV?

---

Second Question:

 

Why the hell "avd" uses 99.9% of a single-core while scanning .txt files?

This becomes a issue when your running any linux distro, which the package manager downloads .txt files to know if there's any package to upgrade. The CPU usage of a single core goes all the way up to 100%.

A single "pacman -Syu" which at first download 4x .txt files, can take up to 45 seconds, and It's only at limit 6MB of total size, (I'm on a 400/200Mbit/s WAN, and the package manager mirror is capable to push my link to it's limits.)

This doesn't happen with any other kind of file format, hell, .exe scanning feels like it's instant compared to .txt

---

Third Question:

 

Why "avd" is a single-core service?

---

 

That's It.

Again, It feels like all of this is all expected, but I'm just a home user, so I feel like I shouldn't be complaining that much in here.

Also 750~ CPS is more than sufficient for a Home network.

 

Thanks.

Parents
  • I believe that I understand why I will probably never get answers about those questions, It can be 2 things:

    • I'm complete wrong about everything that I wrote. // (I'm almost sure about this one.)
    • Or this is already known.

     

    Anyways, this picture will haunt my dreams tonight.

    Full blown 8C/16T with 12GB DDR4 RAM, limit by avd, while all Snorts services basically idle at 10% usage at each core.

    At least It's fast when It doesn't use "avd". (Nice touch changing to GB/s instead of showing as xxxx MB/s :D)

     

    Also, sorry for whining too much in here, It's just a bit frustrating seeing all this, If there's any need I'll delete this thread.

    Thanks!


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  • Quick answers:

    There is one instance of the AV engine running in a process called avd. Web proxy, FTP proxy, mail proxy, and DPI mode all call into the one AV instance, so that we don't need to run four copies on the box. Also remember there is configuration such as Dual scan or Single scan with which AV engine you use (Sophos or Avira). Yes, that is how it will continue to work. There is very little performance impact to whether the AV thread is a part of the snort process or a part of the avd process (or rather, other things are bigger impact).

    When you enable HTTPS scanning the system needs to do a lot of SSL decryption which takes CPU cycles (and lower CPS). In addition, it means that files will be AV scanned, which takes CPU (and lowers CPS). However I suspect that the decryption part is the bigger factor. Do not enable/disable HTTPS scanning and then claim you are measuring with and without AV. If you wanted to measure the impact of AV, then leave Decryption on and toggle the "Scan HTTP and decrypted HTTPS" to turn on/off AV. Make sure that you are using a web policy (eg not set to None).

    As for avd using 99% cpu that might be an artifact of "top". Can you give me a real-world example/impact? eg a specific curl for a text file that tool a long time to scan.

  • As an aside,

    For any connection handled by snort, there are several different processes that are called out to. 

     

    Starting at connection (eg the first packet from the client) we are doing checking authentication, which could potentially call out to two different processes.  Note if the client re-uses the connection this cost does not occur again.

    Starting at the request (eg the first packet from the client) we are doing web categorization, a call out to a different process.  This could potentially even call out to make a request to a cloud server.

    At the end of the request (eg the last packet from the server) we are doing AV scanning, a call out to a different process.

    There might be other processes that snort calls out to, but only one per request.

    On the other hand decryption is something that happens on every single packet.

     

    For decryption, if you are downloading 100 1MB files or 1 100MB file I think the CPU cost of decryption is roughly the same.  But for the other costs, the traffic type changes things.

     

  • Michael Dunn said:
    For decryption, if you are downloading 100 1MB files or 1 100MB file I think the CPU cost of decryption is roughly the same.

    That is interesting to know because usually it doesn't work like that and one large data stream is never equal to multiple small data streams taking the same bandwidth.

     Thaks for more testing. I agree that XG v18 is really snappy for regular surfing and they have made great improvements in the surfing performance. DPI can only get better from here and if they can keep the memory footprint and cpu under control, v18 should be a good release.

    Regards

  • Billybob said:

     

    That is interesting to know because usually it doesn't work like that and one large data stream is never equal to multiple small data streams taking the same bandwidth.

    I'm only talking generally and I'm only talking the CPU cost of decrypting traffic.  Yes there is overhead per request.  But at the low level it needs to decrypt 1 million packets it doesn't care how many TCP connections are involved.

    The "overhead per request" such as categorization and AV scanning, the number of requests/data streams matter.  The "overhead per packet" such as SSL decryption the number of requests/streams does not.

     

    When using the system as a user or admin, you don't care about any of that.  When you are trying to do performance testing, understanding how the shape of your test traffic affects the test can make a difference.

     

    I am glad that while a lot of people were complaining about performance in EAP1/2/3, now that EAP3-Refresh people are finding the performance good.  My understanding is that right now they are still doing some tuning so that it is good in both high end and low end appliances as some models are seeing better benefits than others.

  • Michael, seriously, thanks for all the answers, In reality, I'm just a user without knowledge on networking. So the possibilities for myself doing those tests wrong are really high.

    Just one more thing; Again, I can be complete wrong on what I'm about to say.

    This is just from a user perspective.

    * Picture from the latest webinar, also present on YouTube.

    When I made those tests, I've decided to use 4KB - 16KB - 1MB - 100MB files, to see what It has capable of. On that sheet I've used 4KB as example. Of course, as the file size increase, the CPS would lower.

    So XG starts with the Firewall, It will check if there's any rule allowing the communication(From top to bottom), between the user IP, to somewhere else with the desired port, then, It will check SSL/TLS Rules, in which the Rule I've made, has to decrypt all traffic LAN => WAN.

    (Also Web Policy, AV, App Control and IPS has ON.)

    Then It applies the Web Policies, which in this case It checks where this packet is going, or coming from, it can be through SNI or a lot of others ways; (What I think that also happens) Snort will also use the service "nSXLd" for It's cloud web categorization, In my believe It only uses when It doesn't find the domain in It's internal database.

    Everything on this, from the SSL/TLS Inspection and the Web categorization in XG, is FAST, there's no doubt on this.

    Since those tests has made on a local network, which It would communicate all trough IP. The web categorization, as It showed in the logs, would always be "IPAddress", so in my believe, the overhead for the web categorization on those tests were minimal.

     

    The "issue" has when XG would scan the traffic, It would cripple and limit the bandwidth way more than I expected. I know that this isn't a easy task, but is It true that It's single-threaded right now? Or it's just a issue with "top"?

    I don't know much on how this stuff works; But isn't there any possibility to do the same thing as XG has doing with Snort? Spawn a "avd" service in each core and balance the load between them?

    Or I'm making stuff up, and that's completely wrong?

     

    After this, It will do App control also with Snort, and also IPS, which in reality It's pretty fast compared to what I has used to use.

     

    In the end, Is this correct? Or completely wrong?

     

    Thanks!


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  • Hi Prism, I think there is some flaw in your testing methodology. Not sure why the av daemon is choking. From what I understand the only difference between DPI and proxy is the frontend that decrypts your traffic. The proxy is only limited to port 80/443 whereas snort will look into any packet. The rest of the system has not changed from previous versions. So when you pass large amounts of traffic, snort cpu usage should go up till it maxes out your cpu depending on the load. The proxy will have some similar limit but usually much higher since it is only looking at port 80/443. If you turn on IPS and application detection, it will put extra load on your cpu since snort is matching that traffic against different signatures in addition to doing the initial packet inspection.

    Theoretically, the fast path optimization should bypass av scanning after initial inspection. I think if you look at the actual firewall logs it tells you when fast path is being used or not. Most of the times vendors look at the actual throughput numbers (udp throughput) for their performance specs and some do pps (packets per second). 

    Since you are doing connections per second testing, I think fast path is not being utilized at all since all your connections have to be scanned initially (thats why large number of connections can cause DOS on servers). If you do packets per second testing, you will probably get much better results that simulate real world conditions.

    As always this is my understanding... I don't claim to be an expert on firewalls or sophos products so take it with a grain of salt.

    Regards

     

    EDIT: IS AV Daemon choking if you turn off DPI and use proxy in your test? The proxy numbers seem way too low compared to DPI in your testing, I would have thought that the performance would be similar if not better for proxy compared to snort.

  • Billybob said:
    Not sure why the av daemon is choking.

    I believe It's too much traffic, Since I also believe it's single-threaded, you can only put an certain amount of traffic until It hits the limit of that single core.

    Billybob said:
    From what I understand the only difference between DPI and proxy is the frontend that decrypts your traffic.


    Proxy needs to kill the connection, then establish a connection between the Client and another with the Server. And then transmit data between themselves.

    While on the new DPI, as said by the Devs, It's a proxy-less TCP layer Inspection, It probably intercepts the SSL/TLS Handshake, put It's certificate in the middle and let the Client communicate with the server without a need to proxy the traffic.

    If that's exactly how it works, well, I'm not a Dev, so It's better for some Dev to answer this - The CPS difference between the Proxy and DPI is correct.

    Also "awarrenhttp" (The Web Proxy) Is also single-threaded, that's another limitation of it.

     

    Billybob said:
    Since you are doing connections per second testing, I think fast path is not being utilized at all since all your connections have to be scanned initially (thats why large number of connections can cause DOS on servers). If you do packets per second testing, you will probably get much better results that simulate real world conditions.

    The problem with fast path, In my believe, The client is pushing way too many connections, every time creating a new one to a Web server, so I don't know if It's possible to offload something that's generating a new connection all the time, and not transmitting everything trough a single stream. Again, I'm not sure about this, but fast path must works with traffic signatures and SNI for SSL/TLS.

     

    Billybob said:
    As always this is my understanding... I don't claim to be an expert on firewalls or sophos products so take it with a grain of salt.

    I'm also just a User, so in the end I'm probably wrong about >95% of things I say here.

     

    Thanks for the feedback!


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v21 GA @ Home

    Sophos ZTNA (KVM) @ Home

  • I wrote a lengthier reply, most of it informational about packet processing and ultimately not important.

     

    Rest assured that the architecture people know their stuff and are running lots of testing and optimization. Remember also that XG is also meant for customers with 5000 clients all simultaneously downloading things and 100 things being AV scanned at the same time.  How things look in a one client test may max out things in a scenario that is not very real world.

    IIRC there are several parts of XG that look at the number of CPUs and cores and change behavior. How many threads on customer-hardware may be different than on similar XG hardware, and certainly different between an XG110 and a XG750.

    Though it is interesting to speculate, at some point you have to trust that we know what we are doing.  :)

Reply
  • I wrote a lengthier reply, most of it informational about packet processing and ultimately not important.

     

    Rest assured that the architecture people know their stuff and are running lots of testing and optimization. Remember also that XG is also meant for customers with 5000 clients all simultaneously downloading things and 100 things being AV scanned at the same time.  How things look in a one client test may max out things in a scenario that is not very real world.

    IIRC there are several parts of XG that look at the number of CPUs and cores and change behavior. How many threads on customer-hardware may be different than on similar XG hardware, and certainly different between an XG110 and a XG750.

    Though it is interesting to speculate, at some point you have to trust that we know what we are doing.  :)

Children
No Data