[Bug 1706780] [NEW] pthread_mutex_lock robust hangs

Jeff Barber 1706780 at bugs.launchpad.net
Wed Jul 26 21:25:55 UTC 2017


Public bug reported:

I'm using an interprocess (process-shared, robust) pthread_mutex located
in shared memory to synchronize access to a data structure. It has
caused a hang on several occasions when the process whose thread holds
the lock crashed. 99.9% of the time I do not experience the issue. If
one of my processes goes down, the other receives EOWNERDEAD from the
pthread_mutex_lock call as expected, and uses pthread_mutex_consistent
to recover the lock. Once in a while when a process crashes, the
pthread_mutex_lock simply never completes.

After much experimentation, I've managed to create a test case that
reproduces the problem more than 90% of the time. Unfortunately, running
it under strace apparently changes something about it, so I cannot tell
exactly what is going wrong at the syscall level (not sure I would be
able to decode that anyway).

My best guess about the conditions necessary is:
  Process 1, thread 1 acquires the lock
  Process 1, thread 2 attempts to acquire the lock (hence waiting in __lll_robust_lock_wait)
  Process 2 attempts to acquire the lock
  Process 1 crashes.
  Process 2 is left waiting in __lll_robust_lock_wait forever

I believe the sequence of locking threads is important to reproducing
it.

Once in this state, any other caller attempting to lock the mutex also
hangs. The mutex data structure (__owner) still shows process 1, thread
1 as the owning thread.

I don't have the glibc or futex background to go further with debugging.


$ lsb_release -rd
Description:	Ubuntu 16.04.2 LTS
Release:	16.04

$ apt-cache policy libc6
libc6:
  Installed: 2.23-0ubuntu9
  Candidate: 2.23-0ubuntu9
  Version table:
 *** 2.23-0ubuntu9 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
        100 /var/lib/dpkg/status
     2.23-0ubuntu3 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

$ uname -a
Linux tirion 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

** Affects: glibc (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1706780

Title:
  pthread_mutex_lock robust hangs

Status in glibc package in Ubuntu:
  New

Bug description:
  I'm using an interprocess (process-shared, robust) pthread_mutex
  located in shared memory to synchronize access to a data structure. It
  has caused a hang on several occasions when the process whose thread
  holds the lock crashed. 99.9% of the time I do not experience the
  issue. If one of my processes goes down, the other receives EOWNERDEAD
  from the pthread_mutex_lock call as expected, and uses
  pthread_mutex_consistent to recover the lock. Once in a while when a
  process crashes, the pthread_mutex_lock simply never completes.

  After much experimentation, I've managed to create a test case that
  reproduces the problem more than 90% of the time. Unfortunately,
  running it under strace apparently changes something about it, so I
  cannot tell exactly what is going wrong at the syscall level (not sure
  I would be able to decode that anyway).

  My best guess about the conditions necessary is:
    Process 1, thread 1 acquires the lock
    Process 1, thread 2 attempts to acquire the lock (hence waiting in __lll_robust_lock_wait)
    Process 2 attempts to acquire the lock
    Process 1 crashes.
    Process 2 is left waiting in __lll_robust_lock_wait forever

  I believe the sequence of locking threads is important to reproducing
  it.

  Once in this state, any other caller attempting to lock the mutex also
  hangs. The mutex data structure (__owner) still shows process 1,
  thread 1 as the owning thread.

  I don't have the glibc or futex background to go further with
  debugging.

  
  $ lsb_release -rd
  Description:	Ubuntu 16.04.2 LTS
  Release:	16.04

  $ apt-cache policy libc6
  libc6:
    Installed: 2.23-0ubuntu9
    Candidate: 2.23-0ubuntu9
    Version table:
   *** 2.23-0ubuntu9 500
          500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
          500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
          100 /var/lib/dpkg/status
       2.23-0ubuntu3 500
          500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

  $ uname -a
  Linux tirion 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1706780/+subscriptions



More information about the foundations-bugs mailing list