[Bug 2101084] Re: GCC produces wrong code for arm64+sve in some cases

Fri Dec 19 22:15:46 UTC 2025

** Merge proposal linked:
   https://code.launchpad.net/~vpa1977/ubuntu/+source/gcc-12/+git/gcc-12/+merge/497865

** Merge proposal linked:
   https://code.launchpad.net/~vpa1977/ubuntu/+source/gcc-12/+git/gcc-12/+merge/497866

** Merge proposal linked:
   https://code.launchpad.net/~vpa1977/ubuntu/+source/gcc-12/+git/gcc-12/+merge/497867

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to gcc-10 in Ubuntu.
https://bugs.launchpad.net/bugs/2101084

Title:
  GCC produces wrong code for arm64+sve in some cases

Status in gcc:
  Fix Released
Status in Ubuntu Pro:
  In Progress
Status in Ubuntu Pro 20.04 series:
  In Progress
Status in gcc-10 package in Ubuntu:
  New
Status in gcc-11 package in Ubuntu:
  In Progress
Status in gcc-13 package in Ubuntu:
  Invalid
Status in gcc-14 package in Ubuntu:
  Invalid
Status in gcc-8 package in Ubuntu:
  Won't Fix
Status in gcc-9 package in Ubuntu:
  New
Status in gcc-10 source package in Focal:
  Won't Fix
Status in gcc-8 source package in Focal:
  Won't Fix
Status in gcc-9 source package in Focal:
  Won't Fix
Status in gcc-10 source package in Jammy:
  New
Status in gcc-11 source package in Jammy:
  In Progress
Status in gcc-9 source package in Jammy:
  New
Status in gcc-10 source package in Noble:
  New
Status in gcc-11 source package in Noble:
  New
Status in gcc-13 source package in Noble:
  In Progress
Status in gcc-14 source package in Noble:
  New
Status in gcc-9 source package in Noble:
  New
Status in gcc-11 source package in Oracular:
  Won't Fix
Status in gcc-13 source package in Oracular:
  Won't Fix
Status in gcc-14 source package in Oracular:
  Won't Fix
Status in gcc-11 source package in Plucky:
  New
Status in gcc-13 source package in Plucky:
  Invalid
Status in gcc-14 source package in Plucky:
  Invalid
Status in gcc-11 source package in Questing:
  New
Status in gcc-13 source package in Questing:
  Invalid
Status in gcc-14 source package in Questing:
  Invalid
Status in gcc-11 source package in Resolute:
  In Progress
Status in gcc-13 source package in Resolute:
  Invalid
Status in gcc-14 source package in Resolute:
  Invalid
Status in gcc-11 package in Debian:
  New

Bug description:
  [Impact]

  This bug causes data corruption in the ARM64 code compiled with Scalable Vector Extensions (SVE) enabled for the 256-bit SVE processor but executed on 128-bit SVE processors.
  Example is AWS workload built for Graviton3, but executed on Graviton4.

  When the compiler was compiling the ~ConstA (Not ConstA) expression to
  compute the index into the vector it was actually computing -ConstA
  (minus ConstA), e.g. ~4 instead of -5 produced -4.

  Graviton 4  processes a 256-bit vector in two passes. For the second
  pass it runs into this bug when computing indices into the second half
  of the vector and ends up with {-4, -5, -6, -7}, processing the last
  element of the first half twice and never touching the last element of
  the vector.

  This data corruption may cause data loss, failing checksums, and
  potentially security issues.

  [Test Plan]

  I was using Raspberry PI 5 for testing, but any other ARM64 platform
  or virtual machine will be sufficient.

  Install QEMU in noble:

  apt install qemu-user-static

  Launch lxd vm for the affected release, e.g.

  lxc launch ubuntu-daily:jammy tester
  lxc file push test.c tester/home/ubuntu/

  Install affected gcc:
  lxc exec tester -- /bin/sh -c "apt-get update && apt-get install -y gcc-9"

  Compile the reproducer[1]:
  lxc exec tester -- /bin/sh -c "gcc-9 -fno-inline -O3 -Wall -fno-strict-aliasing  -march=armv8.4-a+sve  -o /home/ubuntu/final /home/ubuntu/test.c”

  Fetch the reproducer:
  lxc file pull tester/home/ubuntu/final final

  Execute the testcase:
  qemu-aarch64-static -cpu neoverse-n2 ./final

  The testcase will output:
  PASS: got 0x00bbbbbb 0x00aaaaaa as expected
  If the bug is fixed and
  ERROR: expected 0x00bbbbbb 0x00aaaaaa but got 0x00bbbbbb 0xaaaaaa00
  otherwise.

  [Where the problems can occur]

  The issue is a typo in the code that is used to calculate offset into
  the vector.

  The already corrupted data (e.g. checksums) calculated by the affected
  code will not match with the values produced after the fix. This may
  cause the end user to rebuild the indices relying on the calculated
  hash values after their workloads are recompiled by the fixed gcc.

  [Other info]

  Focal fixes will be done through the -pro updates.

  I have ran the test case set Invalid for the versions that are not
  affected by this issue.

  Affected:
  All gcc-8[2]
  All gcc-9[2]
  All gcc-11[2]
  Noble and down Gcc-12
  Noble and down Gcc-13
  Noble and down Gcc-14
  Gcc-15 is not affected

  The fixed packages will be uploaded to the stable PPA[3] created for this SRU. 
  The PPA depends on -security only. The packages will need to be binary-copied to -updates and -security. 

  [1] https://bugs.launchpad.net/ubuntu/plucky/+source/gcc-14/+bug/2101084/comments/39
  [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976#c21
  [3] https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/lp-2101084

  Original Description:

  [Impact]
  This issue affects SVE vectorization on arm64 platforms, specifically in cases where bitwise-not operations are applied during optimization.

  [Fix]
  This issue has been resolved by an upstream patch.

  commit 78380fd7f743e23dfdf013d68a2f0347e1511550
  Author: Richard Sandiford <richard.sandiford at arm.com>
  Date: Tue Mar 4 10:44:35 2025 +0000

      Fix folding of BIT_NOT_EXPR for POLY_INT_CST [PR118976]

      There was an embarrassing typo in the folding of BIT_NOT_EXPR for
      POLY_INT_CSTs: it used - rather than ~ on the poly_int.  Not sure
      how that happened, but it might have been due to the way that
      ~x is implemented as -1 - x internally.

      gcc/
              PR tree-optimization/118976
              * fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR.
              * config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function.
              (aarch64_run_selftests): Run it.

  [Test Plan]
  1. Launch an instance using the latest generation of Graviton processors (Graviton4).
  2. Compile the following code using the command `gcc -O3 -march=armv8.1-a+sve`:

  #include <stdint.h>
  #include <stdio.h>

  #ifndef NCOUNTS
  #define NCOUNTS 2
  #endif
  typedef struct {
     uint32_t state[5];
     uint32_t count[NCOUNTS];
     unsigned char buffer[64];
  } SHA1_CTX;

  void finalcount_av(SHA1_CTX *restrict ctx, unsigned char *restrict finalcount) {
     // ctx->count is:  uint32_t count[2];
     int count_idx;
     for (int i = 0; i < 4*NCOUNTS; i++) {
         count_idx = (4*NCOUNTS - i - 1)/4; // generic but equivalent for NCOUNTS==2.
         finalcount[i] = (unsigned char)((ctx->count[count_idx] >> ((3-(i & 3)) * 8) ) & 255);
     }
  }

  void finalcount_bv(SHA1_CTX *restrict ctx, unsigned char *restrict finalcount) {
     for (int i=0; i < 4*NCOUNTS; i += 4) {
         int ci = (4*NCOUNTS - i - 1)/4;
         finalcount[i+0] = (unsigned char)((ctx->count[ci] >> (3 * 8) ) & 255);
         finalcount[i+1] = (unsigned char)((ctx->count[ci] >> (2 * 8) ) & 255);
         finalcount[i+2] = (unsigned char)((ctx->count[ci] >> (1 * 8) ) & 255);
         finalcount[i+3] = (unsigned char)((ctx->count[ci] >> (0 * 8) ) & 255);
     }
  }

  int main() {
     unsigned char fa[NCOUNTS*4];
     unsigned char fb[NCOUNTS*4];
     uint32_t *for_print;
     int i;

     SHA1_CTX ctx;
     ctx.count[0] = 0xaaaaaa00;
     ctx.count[1] = 0xbbbbbb00;
     if (NCOUNTS >2 ) ctx.count[2] = 0xcccccc00;
     if (NCOUNTS >3 ) ctx.count[3] = 0xdddddd00;
     finalcount_av(&ctx, fa);
     finalcount_bv(&ctx, fb);

     int ok = 1;
     for (i=0; i<NCOUNTS*4; i++) {
         ok &= fa[i] == fb[i];
     }
     if (!ok) {
         for_print = (uint32_t*)fb;
         printf("ERROR: expected ");
         for (i=0; i<NCOUNTS; i++) {
             printf("0x%08x ",for_print[i]);
         }
         for_print = (uint32_t*)fa;
         printf("but got ");
         for (i=0; i<NCOUNTS; i++) {
             printf("0x%08x ",for_print[i]);
         }
         printf("\n");
         return 1;
     } else {
         for_print = (uint32_t*)fa;
         printf("PASS: got ");
         for (i=0; i<NCOUNTS; i++) {
             printf("0x%08x ",for_print[i]);
         }
         printf("as expected\n");
         return 0;
     }
  }

  3. Verify that the execution output does not contain the string
  "ERROR".

  [Where problems could occur]
  The issue is caused by a typo. If any regressions occur, they are expected to impact only specific partial instructions under certain scenarios, rather than disrupting the overall functionality.

To manage notifications about this bug go to:
https://bugs.launchpad.net/gcc/+bug/2101084/+subscriptions