[Bug 1742941] Comment bridged from LTC Bugzilla

Fri Feb 7 03:20:37 UTC 2020

------- Comment From danielgb at au1.ibm.com 2020-02-06 22:14 EDT-------
Where we are up to is there is a small amount of progress in the zlib-devel (https://zlib.net/mailman/listinfo/zlib-devel_madler.net) however nothing explicit on or off list about merging arch specific patches (from any architecture vendor).

At the request of the community and other vendors, we have added explicit crc32 tests to the zlib and these have been pushed to that upstream PR.
https://github.com/madler/zlib/pull/335

Since the patch on this bug report, the code has been changed from ASM
to C (to increase the portability ), conformed closer to an upstream
style as far a location and interfaces, and improved the test suite as
mentioned.

The zlib1g-dev package in focal (https://packages.ubuntu.com/focal
/zlib1g-dev)  still uses configure && make which the upstream PR patches
correctly (notably I haven't patched CMakefiles as of today). Building
`make crc32_test && ./crc32_test` as a test can be done quickly in the
package build process to validate the accuracy.

We hope this prudence and validation in the upstream PR
https://github.com/madler/zlib/pull/335 can be accepted for the
ubuntu-20.04 LTS release of the zlib package while we continue to work
with upstream on maintenance.

Thanks for your consideration.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to zlib in Ubuntu.
https://bugs.launchpad.net/bugs/1742941

Title:
  zlib: improve  crc32 performance on P8

Status in The Ubuntu-power-systems project:
  Incomplete
Status in zlib package in Ubuntu:
  Incomplete

Bug description:
  Calculate the checksum of data that is 16 byte aligned and a multiple
  of  16 bytes.

  The first step is to reduce it to 1024 bits. We do this in 8 parallel
   chunks in order to mask the latency of the vpmsum instructions. If we
   have more than 32 kB of data to checksum we repeat this step multiple
   times, passing in the previous 1024 bits.

   The next step is to reduce the 1024 bits to 64 bits. This step adds
   32 bits of 0s to the end - this matches what a CRC does. We just
   calculate constants that land the data in this 32 bits.

   We then use fixed point Barrett reduction to compute a mod n over GF(2)
   for n = CRC using POWER8 instructions. We use x = 32.

   http://en.wikipedia.org/wiki/Barrett_reduction

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1742941/+subscriptions