[Bug 1983992] [NEW] nfs-ganesha server crashes regularly
Ponnuvel Palaniyappan
1983992 at bugs.launchpad.net
Mon Aug 8 16:58:04 UTC 2022
Public bug reported:
nfs-ganesha server crashes regularly.
It doesn't happen all the time or easily reproducible. But when it does
crash, the backtrace looks like:
(gdb) bt
#0 atomic_postclear_uint16_t_bits (bits=<optimized out>, var=<optimized out>) at ./ntirpc/misc/abstract_atomic.h:1812
#1 svc_rqst_epoll_event (sr_rec=sr_rec at entry=0x564dc852efd8, ev=0x7f6134002900) at ./src/svc_rqst.c:1416
#2 0x00007f620fa76565 in svc_rqst_epoll_events (n_events=2, sr_rec=0x564dc852efd8) at ./src/svc_rqst.c:1466
#3 svc_rqst_epoll_loop (wpe=0x564dc852efd8) at ./src/svc_rqst.c:1566
#4 0x00007f620fa816d6 in work_pool_thread (arg=0x7f6064002280) at ./src/work_pool.c:184
#5 0x00007f621029a6db in start_thread (arg=0x7f6097cfa700) at pthread_create.c:463
#6 0x00007f620fdbb61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
This was using 3.0.3 on Bionic (available via Ubuntu Cloud Archive
packages).
Upstream nfs-ganesha developers suggested that a "number of fixes"
related to libntirpc fixed what looks like a race condition.
libntirpc is a submodule used in nfs-ganesha and it's where the problem comes form:
https://github.com/nfs-ganesha/ntirpc
There were a number of commits that went in since 3.0 [0]. Given the
crash isn't reproducible easily, it's not straightforward to find the
commits that fixed the issue between 3.0.3 and 3.5 for a potential SRU.
In a user environment where the problem occurred, they were able to test
nfs-ganesha 3.5 and confirmed that it didn't crash over several days
load test whereas 3.0.3 crashed at least once a day under a similar
load/test environment.
[0] https://github.com/nfs-ganesha/ntirpc
[1] https://github.com/nfs-ganesha/ntirpc/commit/1da6533431a23af7406b5961d4b16ef61045b6af
** Affects: nfs-ganesha (Ubuntu)
Importance: Undecided
Status: New
** Tags: sts
** Tags added: sts
** Description changed:
nfs-ganesha server crashes regularly.
It doesn't happen all the time or easily reproducible. But when it does
crash, the backtrace looks like:
(gdb) bt
#0 atomic_postclear_uint16_t_bits (bits=<optimized out>, var=<optimized out>) at ./ntirpc/misc/abstract_atomic.h:1812
#1 svc_rqst_epoll_event (sr_rec=sr_rec at entry=0x564dc852efd8, ev=0x7f6134002900) at ./src/svc_rqst.c:1416
#2 0x00007f620fa76565 in svc_rqst_epoll_events (n_events=2, sr_rec=0x564dc852efd8) at ./src/svc_rqst.c:1466
#3 svc_rqst_epoll_loop (wpe=0x564dc852efd8) at ./src/svc_rqst.c:1566
#4 0x00007f620fa816d6 in work_pool_thread (arg=0x7f6064002280) at ./src/work_pool.c:184
#5 0x00007f621029a6db in start_thread (arg=0x7f6097cfa700) at pthread_create.c:463
#6 0x00007f620fdbb61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
-
- This was using 3.0.3 on Bionic (available via Ubuntu Cloud Archive packages).
+ This was using 3.0.3 on Bionic (available via Ubuntu Cloud Archive
+ packages).
Upstream nfs-ganesha developers suggested that a "number of fixes"
related to libntirpc fixed what looks like a race condition.
libntirpc is a submodule used in nfs-ganesha and it's where the problem comes form:
https://github.com/nfs-ganesha/ntirpc
There were a number of commits that went in since 3.0 [0]. Given the
crash isn't reproducible easily, it's not straightforward to find the
commits that fixed the issue between 3.0.3 and 3.5 for a potential SRU.
+ In a user environment where the problem occurred, they were able to test
+ nfs-ganesha 3.5 and confirmed that it didn't crash over several days
+ load test whereas 3.0.3 crashed at least once a day under a similar
+ load/test environment.
[0] https://github.com/nfs-ganesha/ntirpc
[1] https://github.com/nfs-ganesha/ntirpc/commit/1da6533431a23af7406b5961d4b16ef61045b6af
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nfs-ganesha in Ubuntu.
https://bugs.launchpad.net/bugs/1983992
Title:
nfs-ganesha server crashes regularly
Status in nfs-ganesha package in Ubuntu:
New
Bug description:
nfs-ganesha server crashes regularly.
It doesn't happen all the time or easily reproducible. But when it
does crash, the backtrace looks like:
(gdb) bt
#0 atomic_postclear_uint16_t_bits (bits=<optimized out>, var=<optimized out>) at ./ntirpc/misc/abstract_atomic.h:1812
#1 svc_rqst_epoll_event (sr_rec=sr_rec at entry=0x564dc852efd8, ev=0x7f6134002900) at ./src/svc_rqst.c:1416
#2 0x00007f620fa76565 in svc_rqst_epoll_events (n_events=2, sr_rec=0x564dc852efd8) at ./src/svc_rqst.c:1466
#3 svc_rqst_epoll_loop (wpe=0x564dc852efd8) at ./src/svc_rqst.c:1566
#4 0x00007f620fa816d6 in work_pool_thread (arg=0x7f6064002280) at ./src/work_pool.c:184
#5 0x00007f621029a6db in start_thread (arg=0x7f6097cfa700) at pthread_create.c:463
#6 0x00007f620fdbb61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
This was using 3.0.3 on Bionic (available via Ubuntu Cloud Archive
packages).
Upstream nfs-ganesha developers suggested that a "number of fixes"
related to libntirpc fixed what looks like a race condition.
libntirpc is a submodule used in nfs-ganesha and it's where the problem comes form:
https://github.com/nfs-ganesha/ntirpc
There were a number of commits that went in since 3.0 [0]. Given the
crash isn't reproducible easily, it's not straightforward to find the
commits that fixed the issue between 3.0.3 and 3.5 for a potential
SRU.
In a user environment where the problem occurred, they were able to
test nfs-ganesha 3.5 and confirmed that it didn't crash over several
days load test whereas 3.0.3 crashed at least once a day under a
similar load/test environment.
[0] https://github.com/nfs-ganesha/ntirpc
[1] https://github.com/nfs-ganesha/ntirpc/commit/1da6533431a23af7406b5961d4b16ef61045b6af
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-ganesha/+bug/1983992/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list