[Bug 1742941] Re: zlib: improve crc32 performance on P8

Fri Jan 12 13:29:55 UTC 2018

------- Comment From brenohl at br.ibm.com 2018-01-12 08:22 EDT-------
>'ve created the following which hopefully contains all the features
> required to be upstream friendly.
>
> https://github.com/grooverdan/zlib/commits/power_crc32_c_version
>
> Most importantly compared to the previous PR, it contains the right crc32
> constants.
>
> If you see anything that upstream/distros or anyone else might nit about
> then please tell me (on github or here). The upstream isn't very active so
> presenting something idea on day one is my goal.
>
> Note 1: no clang currently due to...
>
> Note 1a: crc32_vpmsum: __builtin_pack_vector_int128 and
> __builtin_crypto_vpmsumw/__builtin_crypto_vpmsumd - need to review
> https://github.com/racardoso/crc32-vpmsum/commit/
> 97210b9188916eb46377b5eb927ae337948bf016 properly (sorry Rogerio, from
> memory it needs to compile fail for clang BE rather than generate wrong
> results).
>
> Note 1b: clang - no __builtin_cpu_supports (
> https://bugs.llvm.org/show_bug.cgi?id=35898 ) so I might fall back to
> getauxval
>
>
> Is there an better define than __powerpc__ to check in configure to detect
> Power8+ only?
>
> Other suggestions welcome.

** Bug watch added: bugs.llvm.org/ #35898
   https://bugs.llvm.org/show_bug.cgi?id=35898

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to zlib in Ubuntu.
https://bugs.launchpad.net/bugs/1742941

Title:
  zlib: improve  crc32 performance on P8

Status in The Ubuntu-power-systems project:
  New
Status in zlib package in Ubuntu:
  New

Bug description:
  Calculate the checksum of data that is 16 byte aligned and a multiple
  of  16 bytes.

  The first step is to reduce it to 1024 bits. We do this in 8 parallel
   chunks in order to mask the latency of the vpmsum instructions. If we
   have more than 32 kB of data to checksum we repeat this step multiple
   times, passing in the previous 1024 bits.

   The next step is to reduce the 1024 bits to 64 bits. This step adds
   32 bits of 0s to the end - this matches what a CRC does. We just
   calculate constants that land the data in this 32 bits.

   We then use fixed point Barrett reduction to compute a mod n over GF(2)
   for n = CRC using POWER8 instructions. We use x = 32.

   http://en.wikipedia.org/wiki/Barrett_reduction

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1742941/+subscriptions