[Bug 1706780] Re: pthread_mutex_lock robust hangs
Austin Hendrix
1706780 at bugs.launchpad.net
Wed Sep 12 21:26:05 UTC 2018
I'm using robust mutexes in a similar way, and I've found that if I use
PTHREAD_PRIO_INHERIT attribute on my mutexes, I can no longer reproduce
this bug.
It looks like this is similar to
https://bugzilla.redhat.com/show_bug.cgi?id=1401665 .
** Bug watch added: Red Hat Bugzilla #1401665
https://bugzilla.redhat.com/show_bug.cgi?id=1401665
** Attachment added: "Fixed test"
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1706780/+attachment/5188152/+files/tr.c
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1706780
Title:
pthread_mutex_lock robust hangs
Status in glibc package in Ubuntu:
New
Bug description:
I'm using an interprocess (process-shared, robust) pthread_mutex
located in shared memory to synchronize access to a data structure. It
has caused a hang on several occasions when the process whose thread
holds the lock crashed. 99.9% of the time I do not experience the
issue. If one of my processes goes down, the other receives EOWNERDEAD
from the pthread_mutex_lock call as expected, and uses
pthread_mutex_consistent to recover the lock. Once in a while when a
process crashes, the pthread_mutex_lock simply never completes.
After much experimentation, I've managed to create a test case that
reproduces the problem more than 90% of the time. Unfortunately,
running it under strace apparently changes something about it, so I
cannot tell exactly what is going wrong at the syscall level (not sure
I would be able to decode that anyway).
My best guess about the conditions necessary is:
Process 1, thread 1 acquires the lock
Process 1, thread 2 attempts to acquire the lock (hence waiting in __lll_robust_lock_wait)
Process 2 attempts to acquire the lock
Process 1 crashes.
Process 2 is left waiting in __lll_robust_lock_wait forever
I believe the sequence of locking threads is important to reproducing
it.
Once in this state, any other caller attempting to lock the mutex also
hangs. The mutex data structure (__owner) still shows process 1,
thread 1 as the owning thread.
I don't have the glibc or futex background to go further with
debugging.
$ lsb_release -rd
Description: Ubuntu 16.04.2 LTS
Release: 16.04
$ apt-cache policy libc6
libc6:
Installed: 2.23-0ubuntu9
Candidate: 2.23-0ubuntu9
Version table:
*** 2.23-0ubuntu9 500
500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
100 /var/lib/dpkg/status
2.23-0ubuntu3 500
500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
$ uname -a
Linux tirion 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1706780/+subscriptions
More information about the foundations-bugs
mailing list