[Bug 1674399] Re: OpenSSL CPU detection for AMD Ryzen CPUs

Eric Desrochers eric.desrochers at canonical.com
Thu May 4 22:19:37 UTC 2017


[Verificaton YAKKETY]

# i386
- Significant performance increase using the yakkety-proposed/i386 package inside a 32-bit LXD container build using a Ryzen CPU with Intel SHA Extension capability.
- Same performance (as expected) using the yakkety-proposed/i386 package on a non SHA Extension Intel CPU (i7-6770HQ) with yakkety-proposed package.

# amd64
- Significant performance increase using the yakkety-proposed/amd64 package on Ryzen CPU with Intel SHA Extension capability.
- Same performance (as expected) using the yakkety-proposed/amd64 package on a non SHA Extension Intel CPU (i7-6770HQ) with yakkety-proposed package.

Note : I unfortunately don't (nor colleagues) have access to a Intel CPU with SHA Extension capability at our disposal. Ideally, if someone has access to one to test it would be good. 
Otherwise, I think it is safe to rely on upstream author of the patch who confirmed it was working as expected using a Intel CPU with SHA extension capability. 

Reference : https://github.com/openssl/openssl/issues/2848
"...Myself I tested on Intel processors, yes, with/without...."

==
* Test yakkety-proposed/i386 on a 32-bit LXD container using a non SHA Extension Intel CPU (Version before -proposed pkg):
--
ii  libssl1.0.0:i386          1.0.2g-1ubuntu9.1                          i386         Secure Sockets Layer toolkit - shared libraries
ii  openssl                   1.0.2g-1ubuntu9.1                          i386         Secure Sockets Layer toolkit - cryptographic utility

# openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 12441833 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 8997589 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 5074636 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 1904828 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 304739 sha1's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) blowfish(idx) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-OIx07U/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOUR
CE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1             66356.44k   191948.57k   433035.61k   650181.29k   832140.63k

# time openssl dgst -sha256 /var/tmp/5Gfile 
SHA256(/var/tmp/5Gfile)= 7f06c62352aebd8125b2a1841e2b9e1ffcbed602f381c3dcb3200200e383d1d5

real	0m15.429s
user	0m14.372s
sys	0m1.052s
==
* Test yakkety-proposed/i386 on a 32-bit LXD container using a non SHA Extension Intel CPU (With -proposed pkg):
--
ii  libssl1.0.0:i386          1.0.2g-1ubuntu9.2                          i386         Secure Sockets Layer toolkit - shared libraries
ii  openssl                   1.0.2g-1ubuntu9.2                          i386         Secure Sockets Layer toolkit - cryptographic utility

# openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 12414183 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 8947717 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 5057099 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 1905356 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 304628 sha1's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) blowfish(idx) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-h4cyBe/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOUR
CE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1             66208.98k   190884.63k   431539.11k   650361.51k   831837.53k

# time openssl dgst -sha256 /var/tmp/5Gfile 
SHA256(/var/tmp/5Gfile)= 7f06c62352aebd8125b2a1841e2b9e1ffcbed602f381c3dcb3200200e383d1d5

real	0m15.047s
user	0m14.352s
sys	0m0.692s
==
* Test yakkety-proposed/i386 on a 32-bit LXD container using a Ryzen CPU (Version before -proposed pkg):
--
ii  libssl1.0.0:i386                           1.0.2g-1ubuntu9.1                                i386         Secure Sockets Layer toolkit - shared libraries
ii  openssl                                    1.0.2g-1ubuntu9.1                                i386         Secure Sockets Layer toolkit - cryptographic utility

# openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 12179205 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 9286258 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 5721265 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 2272855 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 343371 sha1's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) blowfish(idx) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-OIx07U/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOUR
CE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1             64955.76k   198106.84k   488214.61k   775801.17k   937631.74k

# time openssl dgst -sha256 /var/tmp/5Gfile 
SHA256(/var/tmp/5Gfile)= 7f06c62352aebd8125b2a1841e2b9e1ffcbed602f381c3dcb3200200e383d1d5

real	0m15.167s
user	0m14.560s
sys	0m0.600s
==
* Test yakkety-proposed/i386 on a 32-bit LXD container using a Ryzen CPU (With -proposed pkg):
--
ii  libssl1.0.0:i386                           1.0.2g-1ubuntu9.2                                i386         Secure Sockets Layer toolkit - shared libraries
ii  openssl                                    1.0.2g-1ubuntu9.2                                i386         Secure Sockets Layer toolkit - cryptographic utility

# openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 15283062 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 13102409 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 9183632 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 4163208 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 682639 sha1's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) blowfish(idx) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-h4cyBe/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOUR
CE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1             81509.66k   279518.06k   783669.93k  1421041.66k  1864059.56k

#time openssl dgst -sha256 /var/tmp/5Gfile 
SHA256(/var/tmp/5Gfile)= 7f06c62352aebd8125b2a1841e2b9e1ffcbed602f381c3dcb3200200e383d1d5

real	0m3.560s
user	0m3.048s
sys	0m0.508s
==
* Test yakkety-proposed/amd64 on Intel CPU (64-bit) with Non Intel SHA Extension (Version before -proposed pkg):
--
ii  libssl1.0.0:amd64                          1.0.2g-1ubuntu9.1                                amd64        Secure Sockets Layer toolkit - shared libraries
ii  openssl                                    1.0.2g-1ubuntu9.1                                amd64        Secure Sockets Layer toolkit - cryptographic utility

#openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 16195704 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 11405919 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 6562453 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 2449558 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 357312 sha1's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-tWbsaJ/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY
_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -D
GHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1             86377.09k   243326.27k   559995.99k   836115.80k   975699.97k

# time openssl dgst -sha256 /var/tmp/5Gfile
SHA256(/var/tmp/5Gfile)= 7f06c62352aebd8125b2a1841e2b9e1ffcbed602f381c3dcb3200200e383d1d5

real	0m12.811s
user	0m11.748s
sys	0m1.060s
==
* Test yakkety-proposed/amd64 on Intel CPU (64-bit) with Non Intel SHA Extension (With -proposed pkg):
--
ii  libssl1.0.0:amd64                          1.0.2g-1ubuntu9.2                                amd64        Secure Sockets Layer toolkit - shared libraries
ii  openssl                                    1.0.2g-1ubuntu9.2                                amd64        Secure Sockets Layer toolkit - cryptographic utility

# openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 16029840 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 11289948 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 6512044 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 2424904 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 354302 sha1's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-pewLMz/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY
_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -D
GHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1             85492.48k   240852.22k   555694.42k   827700.57k   967480.66k


# time openssl dgst -sha256 /var/tmp/5Gfile
SHA256(/var/tmp/5Gfile)= 7f06c62352aebd8125b2a1841e2b9e1ffcbed602f381c3dcb3200200e383d1d5

real	0m12.448s
user	0m11.696s
sys	0m0.748s
==
* Test yakkety-proposed/amd64 on a Ryzen CPU (Version before -proposed pkg):
--
ii  libssl1.0.0:amd64                          1.0.2g-1ubuntu9.1                                amd64        Secure Sockets Layer toolkit - shared libraries
ii  openssl                                    1.0.2g-1ubuntu9.1                                amd64        Secure Sockets Layer toolkit - cryptographic utility

# openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 17361181 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 12246010 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 6780969 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 2449489 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 352468 sha1's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-tWbsaJ/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY
_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -D
GHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1             92592.97k   261248.21k   578642.69k   836092.25k   962472.62k

#time openssl dgst -sha256 /var/tmp/5Gfile 
SHA256(/var/tmp/5Gfile)= 7f06c62352aebd8125b2a1841e2b9e1ffcbed602f381c3dcb3200200e383d1d5

real	0m13.330s
user	0m12.400s
sys	0m0.924s
==
* Test yakkety-proposed/amd64 on a Ryzen CPU (With -proposed pkg):
--
ii  libssl1.0.0:amd64                          1.0.2g-1ubuntu9.2                                amd64        Secure Sockets Layer toolkit - shared libraries
ii  openssl                                    1.0.2g-1ubuntu9.2                                amd64        Secure Sockets Layer toolkit - cryptographic utility

# openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 25034315 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 19533533 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 11903290 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 4640937 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 694256 sha1's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-pewLMz/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY
_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -D
GHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1            133516.35k   416715.37k  1015747.41k  1584106.50k  1895781.72k

# time openssl dgst -sha256 /var/tmp/5Gfile 
SHA256(/var/tmp/5Gfile)= 7f06c62352aebd8125b2a1841e2b9e1ffcbed602f381c3dcb3200200e383d1d5

real	0m3.484s
user	0m2.952s
sys	0m0.528s
==


** Bug watch added: github.com/openssl/openssl/issues #2848
   https://github.com/openssl/openssl/issues/2848

** Tags removed: verification-needed
** Tags added: verification-done-yakkety

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1674399

Title:
  OpenSSL CPU detection for AMD Ryzen CPUs

Status in openssl package in Ubuntu:
  Fix Released
Status in openssl source package in Xenial:
  In Progress
Status in openssl source package in Yakkety:
  Fix Committed
Status in openssl source package in Zesty:
  In Progress
Status in openssl source package in Artful:
  Fix Released

Bug description:
  [Impact]

  * Context:

  AMD added support in their processors for SHA Extensions[1] (CPU flag:
  sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in
  64bit only (Confirmed with AMD representative). Current OpenSSL
  version in Ryzens still calls SHA for SSSE3 routine as result a number
  of extensions were effectively masked on Ryzen and shows no
  improvement.

  [1] /proc/cpuinfo
  processor : 0
  vendor_id : AuthenticAMD
  cpu family : 23
  model : 1
  model name : AMD Ryzen 5 1600 Six-Core Processor
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse
  4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho
  pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

  [2] - sha_ni: SHA1/SHA256 Instruction Extensions

  [3] - https://en.wikipedia.org/wiki/Ryzen
  ...
  All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5]
  ...

  * Program to performs the CPUID check:

  Reference :
  https://software.intel.com/en-us/articles/intel-sha-extensions

  ... Availability of the Intel® SHA Extensions on a particular
  processor can be determined by checking the SHA CPUID bit in
  CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function,
  using inline assembly, performs the CPUID check:

  --
  int CheckForIntelShaExtensions() {
     int a, b, c, d;

     // Look for CPUID.7.0.EBX[29]
     // EAX = 7, ECX = 0
     a = 7;
     c = 0;

     asm volatile ("cpuid"
          :"=a"(a), "=b"(b), "=c"(c), "=d"(d)
          :"a"(a), "c"(c)
         );

     // Intel® SHA Extensions feature bit is EBX[29]
     return ((b >> 29) & 1);
  }
  --

  On CPU with sha_ni the program return "1". Otherwise it return "0".

  [Test Case]

   * Reproducible with Xenial/Zesty/Artful release.

   * Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8

  real	0m12.835s
  user	0m12.344s
  sys	0m0.484s

  * Openssl speed
  $ openssl speed sha1
  Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s
  Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s
  Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s
  Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s
  Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s
  OpenSSL 1.0.2g 1 Mar 2016
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
  compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
  The 'numbers' are in 1000s of bytes per second processed.
  type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
  sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55

  The performance are clearly better when using the patch which take
  benefit of the sha extension. (See Regression Potential section for
  result with patch)

  [Regression Potential]

   * Note : IRC discussion with infinity :
  https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8

   * Note from irc discussion with apw and rbasak :
  https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2

   * It basically allow openssl to take benefit of sha extension
  potential (mostly performance-wise) now that new AMD cpu starting  to
  have the capability.

  * The code check the CPUID bit to determine if the sha instructions
  are available are not.

  * Maintainer comment proves that he did the successfully tested on
  Intel with/without SHA extension

  Reference: https://github.com/openssl/openssl/issues/2848
  "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without."

  * LP reporter comment :
  I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension :
  (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology)

  [Tested on a Ryzen CPU]
  # Generated a checksum of a big file (e.g. 5GB file) with openssl
   $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
  SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8

  real	0m3.471s
  user	0m2.956s
  sys	0m0.516s

  # Openssl speed
  $ openssl speed sha1
  Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s
  Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s
  Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s
  Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s
  Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s
  OpenSSL 1.0.2g 1 Mar 2016
  built on: reproducible build, date unspecified
  options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
  compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
  The 'numbers' are in 1000s of bytes per second processed.
  type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
  sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k

  [Other Info]

  * Debian Bug :
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145

  * Upstream PR :
  https://github.com/openssl/openssl/issues/2848

  * Upstream Repository :
  https://github.com/openssl/openssl.git

  * Upstream Commits :
  1aed5e1 crypto/x86*cpuid.pl: move extended feature detection.
  ## This fix moves extended feature detection past basic feature detection where it belongs.

  f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
  ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/+subscriptions



More information about the foundations-bugs mailing list