[Bug 1863162] Re: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!

Bug Watch Updater 1863162 at bugs.launchpad.net
Thu Mar 4 18:16:08 UTC 2021


Launchpad has imported 30 comments from the remote bug at
https://sourceware.org/bugzilla/show_bug.cgi?id=19329.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2015-12-04T12:37:03+00:00 nszabolcs wrote:

(this is a continuation of bug 17918, but it turns out to be a different
issue that was originally reported there.)

failure:

Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init:
Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation'
failed!

caused by dlopen (in _dl_add_to_slotinfo and in dl_open_worker) doing

  listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
  //...
  if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))

while pthread_create (in _dl_allocate_tls_init) concurrently doing

  assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation));

so

T1:
  y = x + 1;
  ++x;

T2:
  assert(y <= x);

this is hard to trigger as the race window is short compared to the time
dlopen and pthread_create takes, however if i add a usleep(1000) between
the two operations in T1, it is triggered all the time.

the slotinfo and tls generation update lack any sort of synchronization
or atomics in _dl_allocate_tls_init (dlopen holds GL(dl_load_lock)).

on x86_64 with added usleep:

(gdb) p _rtld_local._dl_tls_dtv_slotinfo_list->slotinfo[0]@64
$11 = {{gen = 0, map = 0x7ffff7ff94e8}, {gen = 1, map = 0x7ffff7ff94e8}, {gen = 2, map = 0x7ffff0000910}, {gen = 0, map = 0x0} <repeats 61 times>}
(gdb) p _rtld_local._dl_tls_generation
$12 = 1

T1:
#0  0x00007ffff7df2097 in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff7df1f74 in usleep (useconds=<optimised out>) at ../sysdeps/posix/usleep.c:32
#2  0x00007ffff7decc6b in dl_open_worker (a=a at entry=0x7ffff7611c80) at dl-open.c:527
#3  0x00007ffff7de8314 in _dl_catch_error (objname=objname at entry=0x7ffff7611c70, errstring=errstring at entry=0x7ffff7611c78, mallocedp=mallocedp at entry=0x7ffff7611c6f, 
    operate=operate at entry=0x7ffff7dec720 <dl_open_worker>, args=args at entry=0x7ffff7611c80) at dl-error.c:187
#4  0x00007ffff7dec2a9 in _dl_open (file=0x7ffff7611ee0 "mod-0.so", mode=-2147483646, caller_dlopen=0x4007e2 <start+34>, nsid=-2, argc=<optimised out>, 
    argv=<optimised out>, env=0x7fffffffe378) at dl-open.c:652
#5  0x00007ffff7bd5ee9 in dlopen_doit (a=a at entry=0x7ffff7611eb0) at dlopen.c:66
#6  0x00007ffff7de8314 in _dl_catch_error (objname=0x7ffff00008d0, errstring=0x7ffff00008d8, mallocedp=0x7ffff00008c8, operate=0x7ffff7bd5e90 <dlopen_doit>, 
    args=0x7ffff7611eb0) at dl-error.c:187
#7  0x00007ffff7bd6521 in _dlerror_run (operate=operate at entry=0x7ffff7bd5e90 <dlopen_doit>, args=args at entry=0x7ffff7611eb0) at dlerror.c:163
#8  0x00007ffff7bd5f82 in __dlopen (file=file at entry=0x7ffff7611ee0 "mod-0.so", mode=mode at entry=2) at dlopen.c:87
#9  0x00000000004007e2 in start (a=<optimised out>) at a.c:19
#10 0x00007ffff79bf3d4 in start_thread (arg=0x7ffff7612700) at pthread_create.c:333
#11 0x00007ffff76feedd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

T2:
#0  __GI___assert_fail (assertion=0x7ffff7df8840 "listp->slotinfo[cnt].gen <= GL(dl_tls_generation)", file=0x7ffff7df68e6 "dl-tls.c", line=493, 
    function=0x7ffff7df9020 <__PRETTY_FUNCTION__.9528> "_dl_allocate_tls_init") at dl-minimal.c:220
#1  0x00007ffff7deb492 in __GI__dl_allocate_tls_init (result=0x7fffb7fff700) at dl-tls.c:493
#2  0x00007ffff79bff67 in allocate_stack (stack=<synthetic pointer>, pdp=<synthetic pointer>, attr=0x7fffffffdf90) at allocatestack.c:579
#3  __pthread_create_2_1 (newthread=newthread at entry=0x7fffffffe078, attr=attr at entry=0x0, start_routine=start_routine at entry=0x4007c0 <start>, arg=arg at entry=0xd)
    at pthread_create.c:526
#4  0x000000000040062a in main () at a.c:34


i think
  GL(dl_tls_generation)
  GL(dl_tls_dtv_slotinfo_list)
  listp->slotinfo[i].map
  listp->slotinfo[i].gen
  listp->next
  
may all be accessed concurrently by pthread_create and dlopen without
any synchronization.

this can also cause wrong maxgen computation into dtv[0].counter

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/0

------------------------------------------------------------------------
On 2015-12-29T10:51:48+00:00 I-palachev wrote:

Hi, I've suggested a patch for this bug:
https://sourceware.org/ml/libc-alpha/2015-12/msg00570.html

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/1

------------------------------------------------------------------------
On 2016-01-08T18:19:09+00:00 nszabolcs wrote:

Created attachment 8893
test case (main module)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/2

------------------------------------------------------------------------
On 2016-01-08T18:20:10+00:00 nszabolcs wrote:

Created attachment 8894
test case (build script)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/3

------------------------------------------------------------------------
On 2016-02-02T11:14:58+00:00 nszabolcs wrote:

assigned this to myself, will work on it for 2.24, the current latest patch is
https://sourceware.org/ml/libc-alpha/2016-01/msg00480.html

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/4

------------------------------------------------------------------------
On 2016-08-05T20:40:10+00:00 Mavant-f wrote:

Is this patch still being reviewed? The last update I see is
https://sourceware.org/ml/libc-alpha/2016-01/msg00620.html, but I'm not
familiar with how issue tracking works for this project so I could
easily have missed something...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/5

------------------------------------------------------------------------
On 2017-06-15T05:45:43+00:00 Markus-0fjh3 wrote:

I sometimes see the same failure during make check:

env GCONV_PATH=/var/tmp/glibc-build/iconvdata LOCPATH=/var/tmp/glibc-build/localedata LC_ALL=C   /var/tmp/glibc-build/elf/ld-linux-x86-64.so.2 --library-path /var/tmp/glibc-build:/var/tmp/glibc-build/math:/var/tmp/glibc-build/elf:/var/tmp/glibc-build/dlfcn:/var/tmp/glibc-build/nss:/var/tmp/glibc-build/nis:/var/tmp/glibc-build/rt:/var/tmp/glibc-build/resolv:/var/tmp/glibc-build/crypt:/var/tmp/glibc-build/mathvec:/var/tmp/glibc-build/support:/var/tmp/glibc-build/nptl /var/tmp/glibc-build/nptl/tst-stack4  > /var/tmp/glibc-build/nptl/tst-stack4.out; \                                                                                                                                                                   
../scripts/evaluate-test.sh nptl/tst-stack4 $? false false > /var/tmp/glibc-build/nptl/tst-stack4.test-result                                                                      
Inconsistency detected by ld.so: dl-tls.c: 488: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! 

This is unfortunately not reproducible.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/6

------------------------------------------------------------------------
On 2017-08-28T21:24:52+00:00 Pádraig Brady wrote:

We were often hitting this issue with some multithreaded binaries with many shared libs. These patches referenced here, address the issue. Specifically:
  https://patches.linaro.org/patch/85007/
  https://patches.linaro.org/patch/85008/

These have been _extensively_ tested here with glibc-2.23 with many
binaries

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/7

------------------------------------------------------------------------
On 2017-08-29T20:30:16+00:00 Carlos-0 wrote:

(In reply to Pádraig Brady from comment #7)
> We were often hitting this issue with some multithreaded binaries with many
> shared libs. These patches referenced here, address the issue. Specifically:
>   https://patches.linaro.org/patch/85007/
>   https://patches.linaro.org/patch/85008/
> 
> These have been _extensively_ tested here with glibc-2.23 with many binaries

Please repost those to libc-alpha so we can review.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/8

------------------------------------------------------------------------
On 2017-09-29T23:54:54+00:00 Pádraig Brady wrote:

We found an off by one issue with this (with ASAN + certain number of
shared libs). When the last vector in the _dl_allocate_tls_init list of
vectors was of size one it would have been skipped. The fix is:

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 073321c..2c9ad2a 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result)
        }

       total += cnt;
-      if (total >= dtv_slots)
+      if (total > dtv_slots)
        break;

       /* Synchronize with dl_add_to_slotinfo.  */

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/9

------------------------------------------------------------------------
On 2018-01-16T18:20:03+00:00 Mmezeul wrote:

Has there been any activity on this one lately? Does anyone know if a
fix will be coming anytime soon?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/10

------------------------------------------------------------------------
On 2018-01-17T15:41:50+00:00 Pádraig Brady wrote:

This has been _very_ well tested at facebook
Note the additional fix in comment #9
It would be great to merge this. thanks!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/11

------------------------------------------------------------------------
On 2018-01-17T16:40:58+00:00 nszabolcs wrote:

(In reply to Pádraig Brady from comment #11)
> This has been _very_ well tested at facebook
> Note the additional fix in comment #9
> It would be great to merge this. thanks!

sorry i didnt have time to work on this in this release cycle, i'll try
to look at it in the next one if others don't beat me to it (the
comments can be improved, dtv_slots should be fixed so it has consistent
meaning and one should reason about the consequences of removing the
asserts, they might catch valid corruption that is still present via
dlclose races that are not fixed).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/12

------------------------------------------------------------------------
On 2019-02-27T18:43:08+00:00 lukas wrote:

(In reply to Szabolcs Nagy from comment #12)

> sorry i didnt have time to work on this in this release cycle, i'll try to
> look at it in the next one if others don't beat me to it (the comments can
> be improved, dtv_slots should be fixed so it has consistent meaning and one
> should reason about the consequences of removing the asserts, they might
> catch valid corruption that is still present via dlclose races that are not
> fixed).

Any update on this? It has been over a year since the last comment.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/13

------------------------------------------------------------------------
On 2019-05-24T19:14:35+00:00 Mike Gulick wrote:

Just want to add that the two patches posted here (and the off-by-one
fix in the comments) have been running by my employer (MathWorks) on at
least 1000 Debian 9 systems for the past 6 months without issue.  It
would be great if these patches could be accepted into glibc itself.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/14

------------------------------------------------------------------------
On 2019-05-24T19:46:49+00:00 Carlos-0 wrote:

(In reply to Mike Gulick from comment #14)
> Just want to add that the two patches posted here (and the off-by-one fix in
> the comments) have been running by my employer (MathWorks) on at least 1000
> Debian 9 systems for the past 6 months without issue.  It would be great if
> these patches could be accepted into glibc itself.

None of these changes are easy to integrate because they fail to explain
in detailed notes why they are correct. We take the conservative
approach not to apply complete solutions. Someone seeing this problem
has to take the position to champion the broader solution as positioned
by Szabolcs from Arm. Alternatively someone needs to explain why the
partial solution is better than no solution and champion that on libc-
alpha.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/15

------------------------------------------------------------------------
On 2019-06-26T01:04:45+00:00 Roberl wrote:

This testcase seems to reproduce the bug pretty reliably.

https://github.com/jrmuizel/dl-open-race

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/16

------------------------------------------------------------------------
On 2019-06-26T03:03:44+00:00 Roberl wrote:

Sorry, it actually does not. And I see there was already a testcase
posted here. https://sourceware.org/ml/libc-alpha/2016-11/msg00917.html

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/17

------------------------------------------------------------------------
On 2020-05-03T10:56:07+00:00 Sergei Trofimovich wrote:

In https://bugs.gentoo.org/719674#c12 gentoo sees nptl/tst-stack4
crashes somewhat reliably on arm64:

# while :; do date; env GCONV_PATH=/var/tmp/portage/sys-
libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-
nptl/iconvdata LOCPATH=/var/tmp/portage/sys-libs/glibc-2.30-r8/work
/build-arm64-aarch64-unknown-linux-gnu-nptl/localedata LC_ALL=C
/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64
-unknown-linux-gnu-nptl/elf/ld-linux-aarch64.so.1 --library-path
/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64
-unknown-linux-gnu-nptl:/var/tmp/portage/sys-libs/glibc-2.30-r8/work
/build-arm64-aarch64-unknown-linux-gnu-nptl/math:/var/tmp/portage/sys-
libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-
nptl/elf:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-
arm64-aarch64-unknown-linux-gnu-nptl/dlfcn:/var/tmp/portage/sys-
libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-
nptl/nss:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-
arm64-aarch64-unknown-linux-gnu-nptl/nis:/var/tmp/portage/sys-
libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-
nptl/rt:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64
-unknown-linux-gnu-nptl/resolv:/var/tmp/portage/sys-
libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-
nptl/mathvec:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-
arm64-aarch64-unknown-linux-gnu-nptl/support:/var/tmp/portage/sys-
libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-
nptl/crypt:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-
arm64-aarch64-unknown-linux-gnu-nptl/nptl::/var/tmp/portage/sys-
libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-
nptl//dlfcn /var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-
arm64-aarch64-unknown-linux-gnu-nptl/nptl/tst-stack4; done

Sun 03 May 2020 10:42:08 AM UTC
Sun 03 May 2020 10:42:21 AM UTC
Sun 03 May 2020 10:42:34 AM UTC
Didn't expect signal from child: got `Segmentation fault'
...
Sun 03 May 2020 10:42:56 AM UTC
malloc(): invalid size (unsorted)
Didn't expect signal from child: got `Aborted'
..
Sun 03 May 2020 10:46:21 AM UTC
free(): corrupted unsorted chunks
Didn't expect signal from child: got `Aborted'
...
Sun 03 May 2020 10:46:55 AM UTC
Didn't expect signal from child: got `Segmentation fault'
Sun 03 May 2020 10:47:04 AM UTC
double free or corruption (!prev)
Didn't expect signal from child: got `Aborted'
...
Sun 03 May 2020 10:50:54 AM UTC
free(): invalid pointer
Didn't expect signal from child: got `Aborted'
...
Sun 03 May 2020 10:52:12 AM UTC
tst-stack4: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
Didn't expect signal from child: got `Aborted'

Does it look like the same issue described here?

# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):           2
Vendor ID:           Cavium
Model:               1
Model name:          ThunderX 88XX
Stepping:            0x1
BogoMIPS:            200.00
L1d cache:           32K
L1i cache:           78K
L2 cache:            16384K
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32

# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-unknown-linux-gnu/9.3.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-9.3.0/work/gcc-9.3.0/configure --host=aarch64-unknown-linux-gnu --build=aarch64-unknown-linux-gnu --prefix=/usr --bindir=/usr/aarch64-unknown-linux-gnu/gcc-bin/9.3.0 --includedir=/usr/lib/gcc/aarch64-unknown-linux-gnu/9.3.0/include --datadir=/usr/share/gcc-data/aarch64-unknown-linux-gnu/9.3.0 --mandir=/usr/share/gcc-data/aarch64-unknown-linux-gnu/9.3.0/man --infodir=/usr/share/gcc-data/aarch64-unknown-linux-gnu/9.3.0/info --with-gxx-include-dir=/usr/lib/gcc/aarch64-unknown-linux-gnu/9.3.0/include/g++-v9 --with-python-dir=/share/gcc-data/aarch64-unknown-linux-gnu/9.3.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 9.3.0 p2' --disable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-altivec --disable-fixed-point --enable-libgomp --disable-libmudflap --disable-libssp --disable-libada --disable-systemtap --enable-vtable-verify --enable-lto --without-isl --enable-default-pie --enable-default-ssp
Thread model: posix
gcc version 9.3.0 (Gentoo 9.3.0 p2)

# uname -r
4.9.0-4-arm64

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/19

------------------------------------------------------------------------
On 2020-05-04T13:14:28+00:00 Nsz-j wrote:

(In reply to Sergei Trofimovich from comment #18)
> ...
> Sun 03 May 2020 10:46:55 AM UTC
> Didn't expect signal from child: got `Segmentation fault'
> Sun 03 May 2020 10:47:04 AM UTC
> double free or corruption (!prev)
> Didn't expect signal from child: got `Aborted'
> ...
> Sun 03 May 2020 10:50:54 AM UTC
> free(): invalid pointer
> Didn't expect signal from child: got `Aborted'
> ...
> Sun 03 May 2020 10:52:12 AM UTC
> tst-stack4: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top
> (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE &&
> prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)'
> failed.
> Didn't expect signal from child: got `Aborted'
> 
> Does it look like the same issue described here?

it can be related, hard to tell.
(your failures are consistently heap corruptions
detected in malloc/free, instead of dynamic tls
related state corruption)

if you can rebuild glibc try the patches from
comment 7 if they don't help then your issue
is different. (if the issue disappears we don't
know if the new barriers just masked your issue
or fixed them).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/20

------------------------------------------------------------------------
On 2020-05-04T18:25:15+00:00 Sergei Trofimovich wrote:

(In reply to Szabolcs Nagy from comment #19)
> (In reply to Sergei Trofimovich from comment #18)
> > ...
> > Sun 03 May 2020 10:46:55 AM UTC
> > Didn't expect signal from child: got `Segmentation fault'
> > Sun 03 May 2020 10:47:04 AM UTC
> > double free or corruption (!prev)
> > Didn't expect signal from child: got `Aborted'
> > ...
> > Sun 03 May 2020 10:50:54 AM UTC
> > free(): invalid pointer
> > Didn't expect signal from child: got `Aborted'
> > ...
> > Sun 03 May 2020 10:52:12 AM UTC
> > tst-stack4: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top
> > (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE &&
> > prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)'
> > failed.
> > Didn't expect signal from child: got `Aborted'
> > 
> > Does it look like the same issue described here?
> 
> it can be related, hard to tell.
> (your failures are consistently heap corruptions
> detected in malloc/free, instead of dynamic tls
> related state corruption)
> 
> if you can rebuild glibc try the patches from
> comment 7 if they don't help then your issue
> is different. (if the issue disappears we don't
> know if the new barriers just masked your issue
> or fixed them).

Tried patches from #comment7 on glibc-2.30. No failures after 100 test
runs. Usually fails after 3-4 runs.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/21

------------------------------------------------------------------------
On 2020-09-09T22:12:15+00:00 Jg-jguk wrote:

running latest Ubuntu 20.04.1 LTS
$ ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9) 2.31

Just got this when launching google-chrome from the command line


Inconsistency detected by ld.so: ../elf/dl-tls.c: 481: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
Command exited with non-zero status 127


Could that assert be updated to give more information if you have another assert_str() macro that could be used.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/22

------------------------------------------------------------------------
On 2020-09-28T15:27:36+00:00 Jg-jguk wrote:

It's a bit surprising there is an assert() in a release build in Ubuntu.
Usually assert() would be for a debug build.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/23

------------------------------------------------------------------------
On 2020-10-09T10:51:09+00:00 Jg-jguk wrote:

On my computer any C program compiled with assert(0) dumps a core file,
but this glibc issue assert does not dump a core file. Is there an issue
with this assert macro in glibc? The message output on the terminal is
different from the standard macro


$ ./a
a: a.c:6: main: Assertion `0' failed.
Aborted (core dumped)


$ cat a.c
// gcc -Wall -o a a.c
#include <assert.h>

int main()
{
    assert(0);
}

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/24

------------------------------------------------------------------------
On 2020-10-09T20:21:37+00:00 Carlos-0 wrote:

(In reply to Jonny Grant from comment #23)
> On my computer any C program compiled with assert(0) dumps a core file, but
> this glibc issue assert does not dump a core file. Is there an issue with
> this assert macro in glibc? The message output on the terminal is different
> from the standard macro

Please ask these questions on libc-help at sourceware.org where developers
can help you with any issues you have trying to put together a
reproducer.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/25

------------------------------------------------------------------------
On 2020-10-29T03:12:39+00:00 Lvying-system-thoughts wrote:

Hi, when I use Szabolcs Nagy's comment 2 comment 3, I got Assertion:
Inconsistency detected by ld.so: dl-tls.c: 517: _dl_allocate_tls_init: Assertion `listp != NULL' failed!
Also, I try this testcase pacth: https://patchwork.ozlabs.org/project/glibc/patch/5836CC80.9070101@arm.com/
After I put patch the patch into the glibc code, and run nptl testcase, I got:
../scripts/evaluate-test.sh nptl/tst-tls7 $? false false > /root/build/build/glibc/nptl/tst-tls7.test-result
Inconsistency detected by ld.so: dl-tls.c: 517: _dl_allocate_tls_init: Assertion `listp != NULL' failed!

Both of the testcases got different assertion, not the assertion:
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed!

So, how to reproduce this problem to get the same failure.
And is there a plan to solve this problem? This problem has been going on for a long time.

Thanks!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/26

------------------------------------------------------------------------
On 2020-12-01T02:15:17+00:00 Lvying-system-thoughts wrote:

Update test result:
I use glibc2.28 source code to reproduce this problem:
1. testcase: Szabolcs Nagy's comment 2 comment 3
   result: Inconsistency detected by ld.so: dl-tls.c: 517: _dl_allocate_tls_init: Assertion `listp != NULL' failed!
   Not the same assertion
2. testcase: add Szabolcs Nagy's testcase v3 into nptl testcase:
https://patchwork.ozlabs.org/project/glibc/patch/5836CC80.9070101@arm.com/
    tst-tls7 result: same as No.1
3. testcase same as NO.2, However I add usleep(1000) between _dl_add_to_slotinfo and ++GL(dl_tls_generation) at file elf/dl-open.c.
   tst-tls7 result: same as No.1
   tst-stack4 can reliably reproduce this problem

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/27

------------------------------------------------------------------------
On 2020-12-24T16:59:46+00:00 Nsz-j wrote:

i wrote down some more background before i resubmit my patches:
https://sourceware.org/pipermail/libc-alpha/2020-December/121090.html

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/28

------------------------------------------------------------------------
On 2021-02-07T23:04:20+00:00 Carlos-0 wrote:

I still see this issue in 2.33 testing. I saw it recently for ppc64 BE
testing.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/30

------------------------------------------------------------------------
On 2021-02-17T16:00:15+00:00 Nsz-j wrote:

i have a new patch set that includes a different fix for this bug:
https://sourceware.org/pipermail/libc-alpha/2021-February/122626.html

the new fix takes the dlopen lock at thread creation time instead
of just using atomics (which cannot work for fixing the same race
with dlclose: bug 27111).

using atomics is still necessary for tls access.

it will likely take a few review iterations to get this in glibc.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/31


** Changed in: glibc
       Status: Unknown => Confirmed

** Changed in: glibc
   Importance: Unknown => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1863162

Title:
  Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init:
  Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!

Status in GLibC:
  Confirmed
Status in glibc package in Ubuntu:
  Confirmed

Bug description:
  When using glibc as part of our NSX product, we are running into the
  above mentioned glibc assert case sometimes.

  Here's relevant revision information :

  Ubuntu - 16.04
  glibc version - 2.23.

  
  This is a known issue with resolution identified as per thread link below :

   https://sourceware.org/ml/libc-alpha/2016-01/msg00480.htm and in
  addition see Comment 9 in
  https://sourceware.org/bugzilla/show_bug.cgi?id=19329.

  We have applied this patch in our product and it seems to be working
  fine.

  Is there a way to upstream these changes and make those available in
  standard glibc upstream?

  Please let us know.

  Here are the two patches:

  PATCH1
  =============================
  diff --git a/elf/dl-open.c b/elf/dl-open.c
  index 6f178b3..2b97605 100644
  --- a/elf/dl-open.c
  +++ b/elf/dl-open.c
  @@ -524,9 +524,16 @@ dl_open_worker (void *a)
       }
   
     /* Bump the generation number if necessary.  */
  -  if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))
  -    _dl_fatal_printf (N_("\
  +  if (any_tls)
  +    {
  +      size_t newgen = GL(dl_tls_generation) + 1;
  +      if (__builtin_expect (newgen == 0, 0))
  +	_dl_fatal_printf (N_("\
   TLS generation counter wrapped!  Please report this."));
  +      /* Synchronize with the load acquire in _dl_allocate_tls_init.
  +	 See the CONCURRENCY NOTES there in dl-tls.c.  */
  +      atomic_store_release (&GL(dl_tls_generation), newgen);
  +    }
   
     /* We need a second pass for static tls data, because _dl_update_slotinfo
        must not be run while calls to _dl_add_to_slotinfo are still pending.  */
  diff --git a/elf/dl-tls.c b/elf/dl-tls.c
  index ed13fd9..7184a54 100644
  --- a/elf/dl-tls.c
  +++ b/elf/dl-tls.c
  @@ -443,6 +443,48 @@ _dl_resize_dtv (dtv_t *dtv)
   }
   
   
  +/* CONCURRENCY NOTES:
  +
  +   During dynamic TLS and DTV allocation and setup various objects may be
  +   accessed concurrently:
  +
  +     GL(dl_tls_max_dtv_idx)
  +     GL(dl_tls_generation)
  +     listp->slotinfo[i].map
  +     listp->slotinfo[i].gen
  +     listp->next
  +
  +   where listp is a node in the GL(dl_tls_dtv_slotinfo_list) list.  The public
  +   APIs that may access them are
  +
  +     Writers: dlopen, dlclose and dynamic linker start up code.
  +     Readers only: pthread_create and __tls_get_addr (TLS access).
  +
  +   The writers hold the GL(dl_load_lock), but the readers don't, so atomics
  +   should be used when accessing these globals.
  +
  +   dl_open_worker (called from dlopen) for each loaded module increases
  +   GL(dl_tls_max_dtv_idx), sets the link_map of the module up, adds a new
  +   slotinfo entry to GL(dl_tls_dtv_slotinfo_list) with the new link_map and
  +   the next generation number GL(dl_tls_generation)+1.  Then it increases
  +   GL(dl_tls_generation) which sinals that the new slotinfo entries are ready.
  +   This last write is release mo so previous writes can be synchronized.
  +
  +   GL(dl_tls_max_dtv_idx) is always an upper bound of the modids of the ready
  +   entries.  The slotinfo list might be shorter than that during dlopen.
  +   Entries in the slotinfo list might have gen > GL(dl_tls_generation) and
  +   map == NULL.
  +
  +   _dl_allocate_tls_init is called from pthread_create and it looks through
  +   the slotinfo list to do the dynamic TLS and DTV setup for the new thread.
  +   It first loads the current GL(dl_tls_generation) with acquire mo and only
  +   considers modules up to that generation ignoring any later change to the
  +   slotinfo list.
  +
  +   TODO: Entries might get changed and freed in dlclose without sync.
  +   TODO: __tls_get_addr is not yet synchronized with dlopen and dlclose.
  +*/
  +
   void *
   internal_function
   _dl_allocate_tls_init (void *result)
  @@ -455,9 +497,18 @@ _dl_allocate_tls_init (void *result)
     struct dtv_slotinfo_list *listp;
     size_t total = 0;
     size_t maxgen = 0;
  -
  -  /* Check if the current dtv is big enough.   */
  -  if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
  +  size_t gen_count;
  +  size_t dtv_slots;
  +
  +  /* Synchronize with the release mo store in dl_open_worker, modules with
  +     larger generation number are ignored.  */
  +  gen_count = atomic_load_acquire (&GL(dl_tls_generation));
  +  /* Check if the current dtv is big enough.  GL(dl_tls_max_dtv_idx) is
  +     concurrently modified, but after the release mo store to
  +     GL(dl_tls_generation) it always remains a modid upper bound for
  +     previously loaded modules so relaxed access is enough.  */
  +  dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
  +  if (dtv[-1].counter < dtv_slots)
       {
         /* Resize the dtv.  */
         dtv = _dl_resize_dtv (dtv);
  @@ -480,18 +531,25 @@ _dl_allocate_tls_init (void *result)
   	  void *dest;
   
   	  /* Check for the total number of used slots.  */
  -	  if (total + cnt > GL(dl_tls_max_dtv_idx))
  +	  if (total + cnt > dtv_slots)
   	    break;
   
  -	  map = listp->slotinfo[cnt].map;
  +	  /* Synchronize with the release mo store in _dl_add_to_slotinfo in
  +	     dlopen, so the generation number read below is for a valid entry.
  +	     TODO: remove_slotinfo in dlclose is not synchronized.  */
  +	  map = atomic_load_acquire (&listp->slotinfo[cnt].map);
   	  if (map == NULL)
   	    /* Unused entry.  */
   	    continue;
   
  +	  size_t gen = listp->slotinfo[cnt].gen;
  +	  if (gen > gen_count)
  +	    /* New, concurrently loaded entry.  */
  +	    continue;
  +
   	  /* Keep track of the maximum generation number.  This might
   	     not be the generation counter.  */
  -	  assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation));
  -	  maxgen = MAX (maxgen, listp->slotinfo[cnt].gen);
  +	  maxgen = MAX (maxgen, gen);
   
   	  dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED;
   	  dtv[map->l_tls_modid].pointer.is_static = false;
  @@ -518,11 +576,14 @@ _dl_allocate_tls_init (void *result)
   	}
   
         total += cnt;
  -      if (total >= GL(dl_tls_max_dtv_idx))
  +      if (total >= dtv_slots)
   	break;
   
  -      listp = listp->next;
  -      assert (listp != NULL);
  +      /* Synchronize with the release mo store in _dl_add_to_slotinfo
  +	 so only initialized slotinfo nodes are looked at.  */
  +      listp = atomic_load_acquire (&listp->next);
  +      if (listp == NULL)
  +	break;
       }
   
     /* The DTV version is up-to-date now.  */
  @@ -916,7 +977,7 @@ _dl_add_to_slotinfo (struct link_map *l)
   	 the first slot.  */
         assert (idx == 0);
   
  -      listp = prevp->next = (struct dtv_slotinfo_list *)
  +      listp = (struct dtv_slotinfo_list *)
   	malloc (sizeof (struct dtv_slotinfo_list)
   		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
         if (listp == NULL)
  @@ -939,9 +1000,15 @@ cannot create TLS data structures"));
         listp->next = NULL;
         memset (listp->slotinfo, '\0',
   	      TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
  +      /* _dl_allocate_tls_init concurrently walks this list at thread
  +	 creation, it must only observe initialized nodes in the list.
  +	 See the CONCURRENCY NOTES there.  */
  +      atomic_store_release (&prevp->next, listp);
       }
   
     /* Add the information into the slotinfo data structure.  */
  -  listp->slotinfo[idx].map = l;
     listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
  +  /* Synchronize with the acquire load in _dl_allocate_tls_init.
  +     See the CONCURRENCY NOTES there.  */
  +  atomic_store_release (&listp->slotinfo[idx].map, l);
   }

  
  PATCH 2
  ========
  diff --git a/elf/dl-tls.c b/elf/dl-tls.c
  index 073321c..2c9ad2a 100644
  --- a/elf/dl-tls.c
  +++ b/elf/dl-tls.c
  @@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result)
          }

         total += cnt;
  -      if (total >= dtv_slots)
  +      if (total > dtv_slots)
          break;

         /* Synchronize with dl_add_to_slotinfo.  */

To manage notifications about this bug go to:
https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions



More information about the foundations-bugs mailing list