[Bug 1742941] Re: zlib: improve crc32 performance on P8
bugproxy
bugproxy at us.ibm.com
Fri Jan 12 13:29:55 UTC 2018
------- Comment From brenohl at br.ibm.com 2018-01-12 08:22 EDT-------
>'ve created the following which hopefully contains all the features
> required to be upstream friendly.
>
> https://github.com/grooverdan/zlib/commits/power_crc32_c_version
>
> Most importantly compared to the previous PR, it contains the right crc32
> constants.
>
> If you see anything that upstream/distros or anyone else might nit about
> then please tell me (on github or here). The upstream isn't very active so
> presenting something idea on day one is my goal.
>
> Note 1: no clang currently due to...
>
> Note 1a: crc32_vpmsum: __builtin_pack_vector_int128 and
> __builtin_crypto_vpmsumw/__builtin_crypto_vpmsumd - need to review
> https://github.com/racardoso/crc32-vpmsum/commit/
> 97210b9188916eb46377b5eb927ae337948bf016 properly (sorry Rogerio, from
> memory it needs to compile fail for clang BE rather than generate wrong
> results).
>
> Note 1b: clang - no __builtin_cpu_supports (
> https://bugs.llvm.org/show_bug.cgi?id=35898 ) so I might fall back to
> getauxval
>
>
> Is there an better define than __powerpc__ to check in configure to detect
> Power8+ only?
>
> Other suggestions welcome.
** Bug watch added: bugs.llvm.org/ #35898
https://bugs.llvm.org/show_bug.cgi?id=35898
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to zlib in Ubuntu.
https://bugs.launchpad.net/bugs/1742941
Title:
zlib: improve crc32 performance on P8
Status in The Ubuntu-power-systems project:
New
Status in zlib package in Ubuntu:
New
Bug description:
Calculate the checksum of data that is 16 byte aligned and a multiple
of 16 bytes.
The first step is to reduce it to 1024 bits. We do this in 8 parallel
chunks in order to mask the latency of the vpmsum instructions. If we
have more than 32 kB of data to checksum we repeat this step multiple
times, passing in the previous 1024 bits.
The next step is to reduce the 1024 bits to 64 bits. This step adds
32 bits of 0s to the end - this matches what a CRC does. We just
calculate constants that land the data in this 32 bits.
We then use fixed point Barrett reduction to compute a mod n over GF(2)
for n = CRC using POWER8 instructions. We use x = 32.
http://en.wikipedia.org/wiki/Barrett_reduction
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1742941/+subscriptions
More information about the foundations-bugs
mailing list