[Bug 2024207] Re: s390x autopkgtest regression of libflame vs glibc in Jammy
Simon Chopin
2024207 at bugs.launchpad.net
Wed Jun 21 13:16:58 UTC 2023
TL;DR: Now the tests pass, but I didn't do a thing.
Long follow up on this: I was investigating this on a fairly beefy VM (8
cores, 16G RAM), and managed to reproduce the issue quickly with a ~60%
hit rate.
The test that times out is basically a thin wrapper around a subprocess
invocation (via subprocess.run) of a Python interpreter, which itself
uses the Python multiprocessing system to execute the Fortran compiler.
When the issue occurs, the entire pool of the mp subprocess is waiting
for new tasks, except for a single thread that waits on a kernel
semaphore. Since the Python stack for that thread is entirely in the
CPython codebase and is in a finalizer, I would guess there's a race
condition on freeing up a lock on a shared resource, which I'd wager is
stdout or similar.
Removing the pthread-related patch from the glibc SRU didn't improve the
situation, despite being the most likely culprit (bug 2007796), so I
figured I'd try to reproduce on a VM with similar capabilities as the
ones on the autopkgtest infra (4c/8G as libflame is marked as big)
before trying anything else.
Lo and behold, on that new VM I was unable to reproduce the issue.
Puzzled, I asked the nice folks in the QA team if by any chance the doc
for the VM sizing was out-of-date. It's not, and they even kindly gave
me access to a VM directly on the infra. I still was unable to
reproduce.
Finally I just re-ran the tests, and now they pass. Comparing the logs,
the only difference I could spot is the upgrade linux-libc-dev
5.15.0-73.80 -> 5.15.0-75.82.
Also of note, it turns out those tests have been disabled in subsequent
versions in Debian as they're flaky and don't provide much value since
numpy isn't compiled with libflame support, so, if the issue comes back,
I'll probably ask for them to be hinted.
** Changed in: glibc (Ubuntu)
Status: Fix Released => Invalid
** Changed in: glibc (Ubuntu Jammy)
Status: Triaged => Invalid
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/2024207
Title:
s390x autopkgtest regression of libflame vs glibc in Jammy
Status in glibc package in Ubuntu:
Invalid
Status in glibc source package in Jammy:
Invalid
Bug description:
The libflame autopgktests on Jammy are now failing on s390x against
glibc 2.35-0ubuntu3.2.
It's triggering a timeout in the numpy-with-libflame test suite. To
reproduce, you need python3-numpy, libflame1 and libflame-dev install.
The issue seems to be in
numpy/f2py/tests/test_compile_function.py::test_f2py_init_compile. To
be able to investigate this, I had to change /usr/lib/python3/dist-
packages/numpy/_pytesttester.py, line 183:
- pytest_args += ["-m", label]
+ pytest_args += ["-k", label]
and then I used the following Python script to reproduce:
#!/usr/bin/python3
import numpy as np
np.test("test_f2py_init_compile", verbose=3)
I haven't managed to go further yet, except that I know that the bug
doesn't seem to trigger if running under strace.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/2024207/+subscriptions
More information about the foundations-bugs
mailing list