This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Hardware Acceleration (AES-NI) Isn't being used on the Software version of XG v18

Hi everyone,


Currently the Software version of XG v18 (And v17.5.x) Isn't using AES-NI for It's hardware acceleration on OpenSSL and OpenVPN; Comparing two machines with the same CPU and RAM, but one running XG v18 MR-1 and another Arch Linux we can see a huge difference in throughput in anything related to encryption.

Both machines were using a Intel G5400.


I've verified and XG have all AES kernel modules loaded on it:


    SFVH_SO01_SFOS 18.0.1 MR-1# lsmod | grep "aes"
    aesni_intel           163840  1
    glue_helper            16384  1 aesni_intel
    aes_x86_64             20480  1 aesni_intel
    crypto_simd            16384  1 aesni_intel
    cryptd                 20480  2 aesni_intel,crypto_simd

    SFVH_SO01_SFOS 18.0.1 MR-1# cat /proc/crypto | grep -m1 -o "aes"
    aes

    SFVH_SO01_SFOS 18.0.1 MR-1# cat /proc/crypto | grep -m1 -o "aesni"
    aesni

    SFVH_SO01_SFOS 18.0.1 MR-1# cat /proc/cpuinfo | grep -m1 -o "aes"
    aes




Let's look at the difference between OpenSSL Throughput's now; We can use two commands, one to test with AES-NI hardware acceleration, and another explicit disabling AES-NI for the performance test.

AES-NI Enabled: "openssl speed -evp aes-128-cbc"

AES-NI Disabled: "OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-cbc"


First on XG:



    SFVH_SO01_SFOS 18.0.1 MR-1# openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 24786330 aes-128-cbc's in 2.92s
    Doing aes-128-cbc for 3s on 64 size blocks: 7583445 aes-128-cbc's in 2.94s
    Doing aes-128-cbc for 3s on 256 size blocks: 2013992 aes-128-cbc's in 2.92s
    Doing aes-128-cbc for 3s on 1024 size blocks: 509478 aes-128-cbc's in 2.96s
    Doing aes-128-cbc for 3s on 8192 size blocks: 64313 aes-128-cbc's in 2.96s
    OpenSSL 1.0.2r-fips  26 Feb 2019
    built on: reproducible build, date unspecified
    options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)  
    compiler: ccache_cc -m32 -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/usr/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/toolchain-x86_64_gcc-7.3.0_glibc/usr/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/toolchain-x86_64_gcc-7.3.0_glibc/include -znow -zrelro -DOPENSSL_NO_HEARTBEATS -DTERMIOS -fpic -Wa,--noexecstack -O3 -fomit-frame-pointer -Wall -fomit-frame-pointer -Wall -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/usr/lib/fips-i386/include
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-cbc     135815.51k   165081.80k   176569.16k   176251.85k   177990.57k


    SFVH_SO01_SFOS 18.0.1 MR-1# OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 26476585 aes-128-cbc's in 2.96s
    Doing aes-128-cbc for 3s on 64 size blocks: 7672939 aes-128-cbc's in 2.92s
    Doing aes-128-cbc for 3s on 256 size blocks: 1936728 aes-128-cbc's in 2.92s
    Doing aes-128-cbc for 3s on 1024 size blocks: 511127 aes-128-cbc's in 2.94s
    Doing aes-128-cbc for 3s on 8192 size blocks: 64474 aes-128-cbc's in 2.92s
    OpenSSL 1.0.2r-fips  26 Feb 2019
    built on: reproducible build, date unspecified
    options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
    compiler: ccache_cc -m32 -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/usr/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/toolchain-x86_64_gcc-7.3.0_glibc/usr/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/toolchain-x86_64_gcc-7.3.0_glibc/include -znow -zrelro -DOPENSSL_NO_HEARTBEATS -DTERMIOS -fpic -Wa,--noexecstack -O3 -fomit-frame-pointer -Wall -fomit-frame-pointer -Wall -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/usr/lib/fips-i386/include
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-cbc     143116.68k   168174.01k   169795.33k   178025.19k   180880.48k




Now on Arch Linux:



    $ openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 145734524 aes-128-cbc's in 2.78s
    Doing aes-128-cbc for 3s on 64 size blocks: 59272123 aes-128-cbc's in 2.74s
    Doing aes-128-cbc for 3s on 256 size blocks: 15320112 aes-128-cbc's in 2.74s
    Doing aes-128-cbc for 3s on 1024 size blocks: 3853467 aes-128-cbc's in 2.74s
    Doing aes-128-cbc for 3s on 8192 size blocks: 485119 aes-128-cbc's in 2.76s
    Doing aes-128-cbc for 3s on 16384 size blocks: 244213 aes-128-cbc's in 2.82s
    OpenSSL 1.1.1f  31 Mar 2020
    built on: Tue Mar 31 17:04:42 2020 UTC
    options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
    compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -D_FORTIFY_SOURCE=2
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128-cbc     838759.85k  1384458.35k  1431368.13k  1440127.81k  1439889.44k  1418860.21k


    $ OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 57147307 aes-128-cbc's in 2.98s
    Doing aes-128-cbc for 3s on 64 size blocks: 16715714 aes-128-cbc's in 2.99s
    Doing aes-128-cbc for 3s on 256 size blocks: 4296403 aes-128-cbc's in 2.98s
    Doing aes-128-cbc for 3s on 1024 size blocks: 1100749 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 138102 aes-128-cbc's in 2.98s
    Doing aes-128-cbc for 3s on 16384 size blocks: 68775 aes-128-cbc's in 3.00s
    OpenSSL 1.1.1f  31 Mar 2020
    built on: Tue Mar 31 17:04:42 2020 UTC
    options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
    compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -D_FORTIFY_SOURCE=2
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128-cbc     306831.18k   357794.55k   369086.97k   375722.33k   379641.47k   375603.20k




On Arch Linux we can see a huge difference between the speeds while using and not using AES-NI. But in XG we're stuck at the same speeds.

As stated on OpenSSL and OpenVPN Mailing Lists, "If the results or the two above commands are equal, then your openssl library does NOT use hardware crypto."


With OpenVPN, comparing --test-crypto between the two machines gives even "worse" results:

Arch Linux:


    $ time openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher aes-128-cbc
    Thu Apr 16 00:48:16 2020 disabling NCP mode (--ncp-disable) because not in P2MP client or server mode
    openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher   1.16s user 0.02s system 98% cpu 1.194 total

    3200/1.16 = 2,758 Mbit/s // Theorical Throughput | Real World Throughput is >860Mbit/s (Tested with Iperf3 but physically limited to 1G.)



XG v18:



    SFVH_SO01_SFOS 18.0.1 MR-1# time openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher aes-128-cbc
    real    0m 13.09s
    user    0m 12.90s
    sys     0m 0.04s


    3200 / 13.09 = 244 Mbit/s // Theorical Throughput | Real World Throughput is 220Mbit/s (Tested with Iperf3.)




I've had a talk with  last month about this, on his machine he got the same results, as AES-NI isn't being utilized.

Is there any reason for those results difference between both machines? Or this is a local issue on my installation?


Thanks!



This thread was automatically locked due to age.
Parents
  • For testing purposes, I've decided to try to send OpenSSL 1.1.1f to XG to know if there has something wrong with AES-NI modules on it.

    But instead, It's the OpenSSL library of XG that isn't using AES-NI for hardware acceleration, hence everything that uses OpenSSL for crypto such as OpenVPN don't use AES-NI.

    Here's the results.

     

    SFVH_SO01_SFOS 18.0.1 MR-1# pwd
    /tmp/openssltest


    SFVH_SO01_SFOS 18.0.1 MR-1# /bin/openssl version
    OpenSSL 1.0.2r-fips  26 Feb 2019


    SFVH_SO01_SFOS 18.0.1 MR-1# ./openssl version
    OpenSSL 1.1.1f  31 Mar 2020


    SFVH_SO01_SFOS 18.0.1 MR-1# ./openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 120926436 aes-128-cbc's in 2.88s
    Doing aes-128-cbc for 3s on 64 size blocks: 48265045 aes-128-cbc's in 2.83s
    Doing aes-128-cbc for 3s on 256 size blocks: 14459737 aes-128-cbc's in 2.86s
    Doing aes-128-cbc for 3s on 1024 size blocks: 3809277 aes-128-cbc's in 2.84s
    Doing aes-128-cbc for 3s on 8192 size blocks: 477253 aes-128-cbc's in 2.82s
    Doing aes-128-cbc for 3s on 16384 size blocks: 249299 aes-128-cbc's in 2.93s
    OpenSSL 1.1.1f  31 Mar 2020
    built on: Tue Mar 31 17:04:42 2020 UTC
    options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
    compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -D_FORTIFY_SOURCE=2
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128-cbc     671813.53k  1091506.32k  1294298.14k  1373485.79k  1386403.04k  1394032.36k


    SFVH_SO01_SFOS 18.0.1 MR-1# OPENSSL_ia32cap="~0x200000200000000" ./openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 58141542 aes-128-cbc's in 2.93s
    Doing aes-128-cbc for 3s on 64 size blocks: 16563509 aes-128-cbc's in 2.92s
    Doing aes-128-cbc for 3s on 256 size blocks: 4196913 aes-128-cbc's in 2.92s
    Doing aes-128-cbc for 3s on 1024 size blocks: 1087198 aes-128-cbc's in 2.94s
    Doing aes-128-cbc for 3s on 8192 size blocks: 133815 aes-128-cbc's in 2.88s
    Doing aes-128-cbc for 3s on 16384 size blocks: 65116 aes-128-cbc's in 2.92s
    OpenSSL 1.1.1f  31 Mar 2020
    built on: Tue Mar 31 17:04:42 2020 UTC
    options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
    compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -D_FORTIFY_SOURCE=2
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128-cbc     317496.48k   363035.81k   367948.54k   378670.32k   380629.33k   365363.20k

     

    Now we can clearly see a difference between not using AES-NI and using it.

    Since the GPL code of XG v18 hasn't been released yet, I can't compile OpenSSL with XG patches on it for further testing. I also believe there's no FIPS module for OpenSSL 1.1.1.

     

    Thanks!


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v20 GA @ Home

    XG 115w Rev.3 8GB RAM v19.5 MR3 @ Travel Firewall

  • Hi folks,

    further thoughts on the subject about performance in this thread. While the main players so far in this thread are home users, the same issue affects any business that downloads the software version to build on a VM. If large enough to consider a VM then they would also be expecting a high performance similar to the high end hardware offered by Sophos.

    My thoughts for today.

    Ian

    XG115W - v20 GA - Home

    XG on VM 8 - v20 GA

    If a post solves your question please use the 'Verify Answer' button.

Reply
  • Hi folks,

    further thoughts on the subject about performance in this thread. While the main players so far in this thread are home users, the same issue affects any business that downloads the software version to build on a VM. If large enough to consider a VM then they would also be expecting a high performance similar to the high end hardware offered by Sophos.

    My thoughts for today.

    Ian

    XG115W - v20 GA - Home

    XG on VM 8 - v20 GA

    If a post solves your question please use the 'Verify Answer' button.

Children
No Data