Hi everyone,
Currently the Software version of XG v18 (And v17.5.x) Isn't using AES-NI for It's hardware acceleration on OpenSSL and OpenVPN; Comparing two machines with the same CPU and RAM, but one running XG v18 MR-1 and another Arch Linux we can see a huge difference in throughput in anything related to encryption.
Both machines were using a Intel G5400.
I've verified and XG have all AES kernel modules loaded on it:
SFVH_SO01_SFOS 18.0.1 MR-1# lsmod | grep "aes"
aesni_intel 163840 1
glue_helper 16384 1 aesni_intel
aes_x86_64 20480 1 aesni_intel
crypto_simd 16384 1 aesni_intel
cryptd 20480 2 aesni_intel,crypto_simd
SFVH_SO01_SFOS 18.0.1 MR-1# cat /proc/crypto | grep -m1 -o "aes"
aes
SFVH_SO01_SFOS 18.0.1 MR-1# cat /proc/crypto | grep -m1 -o "aesni"
aesni
SFVH_SO01_SFOS 18.0.1 MR-1# cat /proc/cpuinfo | grep -m1 -o "aes"
aes
Let's look at the difference between OpenSSL Throughput's now; We can use two commands, one to test with AES-NI hardware acceleration, and another explicit disabling AES-NI for the performance test.
AES-NI Enabled: "openssl speed -evp aes-128-cbc"
AES-NI Disabled: "OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-cbc"
First on XG:
SFVH_SO01_SFOS 18.0.1 MR-1# openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 24786330 aes-128-cbc's in 2.92s
Doing aes-128-cbc for 3s on 64 size blocks: 7583445 aes-128-cbc's in 2.94s
Doing aes-128-cbc for 3s on 256 size blocks: 2013992 aes-128-cbc's in 2.92s
Doing aes-128-cbc for 3s on 1024 size blocks: 509478 aes-128-cbc's in 2.96s
Doing aes-128-cbc for 3s on 8192 size blocks: 64313 aes-128-cbc's in 2.96s
OpenSSL 1.0.2r-fips 26 Feb 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: ccache_cc -m32 -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/usr/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/toolchain-x86_64_gcc-7.3.0_glibc/usr/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/toolchain-x86_64_gcc-7.3.0_glibc/include -znow -zrelro -DOPENSSL_NO_HEARTBEATS -DTERMIOS -fpic -Wa,--noexecstack -O3 -fomit-frame-pointer -Wall -fomit-frame-pointer -Wall -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/usr/lib/fips-i386/include
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 135815.51k 165081.80k 176569.16k 176251.85k 177990.57k
SFVH_SO01_SFOS 18.0.1 MR-1# OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 26476585 aes-128-cbc's in 2.96s
Doing aes-128-cbc for 3s on 64 size blocks: 7672939 aes-128-cbc's in 2.92s
Doing aes-128-cbc for 3s on 256 size blocks: 1936728 aes-128-cbc's in 2.92s
Doing aes-128-cbc for 3s on 1024 size blocks: 511127 aes-128-cbc's in 2.94s
Doing aes-128-cbc for 3s on 8192 size blocks: 64474 aes-128-cbc's in 2.92s
OpenSSL 1.0.2r-fips 26 Feb 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: ccache_cc -m32 -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/usr/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/toolchain-x86_64_gcc-7.3.0_glibc/usr/include -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/toolchain-x86_64_gcc-7.3.0_glibc/include -znow -zrelro -DOPENSSL_NO_HEARTBEATS -DTERMIOS -fpic -Wa,--noexecstack -O3 -fomit-frame-pointer -Wall -fomit-frame-pointer -Wall -I/srv/jenkins/workspace/OmC/CI_64/staging_dir/target-x86_64_glibc/usr/lib/fips-i386/include
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 143116.68k 168174.01k 169795.33k 178025.19k 180880.48k
Now on Arch Linux:
$ openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 145734524 aes-128-cbc's in 2.78s
Doing aes-128-cbc for 3s on 64 size blocks: 59272123 aes-128-cbc's in 2.74s
Doing aes-128-cbc for 3s on 256 size blocks: 15320112 aes-128-cbc's in 2.74s
Doing aes-128-cbc for 3s on 1024 size blocks: 3853467 aes-128-cbc's in 2.74s
Doing aes-128-cbc for 3s on 8192 size blocks: 485119 aes-128-cbc's in 2.76s
Doing aes-128-cbc for 3s on 16384 size blocks: 244213 aes-128-cbc's in 2.82s
OpenSSL 1.1.1f 31 Mar 2020
built on: Tue Mar 31 17:04:42 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 838759.85k 1384458.35k 1431368.13k 1440127.81k 1439889.44k 1418860.21k
$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 57147307 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 64 size blocks: 16715714 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 256 size blocks: 4296403 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 1024 size blocks: 1100749 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 138102 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 16384 size blocks: 68775 aes-128-cbc's in 3.00s
OpenSSL 1.1.1f 31 Mar 2020
built on: Tue Mar 31 17:04:42 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 306831.18k 357794.55k 369086.97k 375722.33k 379641.47k 375603.20k
On Arch Linux we can see a huge difference between the speeds while using and not using AES-NI. But in XG we're stuck at the same speeds.
As stated on OpenSSL and OpenVPN Mailing Lists, "If the results or the two above commands are equal, then your openssl library does NOT use hardware crypto."
With OpenVPN, comparing --test-crypto between the two machines gives even "worse" results:
Arch Linux:
$ time openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher aes-128-cbc
Thu Apr 16 00:48:16 2020 disabling NCP mode (--ncp-disable) because not in P2MP client or server mode
openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher 1.16s user 0.02s system 98% cpu 1.194 total
3200/1.16 = 2,758 Mbit/s // Theorical Throughput | Real World Throughput is >860Mbit/s (Tested with Iperf3 but physically limited to 1G.)
XG v18:
SFVH_SO01_SFOS 18.0.1 MR-1# time openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher aes-128-cbc
real 0m 13.09s
user 0m 12.90s
sys 0m 0.04s
3200 / 13.09 = 244 Mbit/s // Theorical Throughput | Real World Throughput is 220Mbit/s (Tested with Iperf3.)
I've had a talk with rfcat_vk last month about this, on his machine he got the same results, as AES-NI isn't being utilized.
Is there any reason for those results difference between both machines? Or this is a local issue on my installation?
Thanks!
This thread was automatically locked due to age.