[Bug 1970460] Re: [SRU] Avoid premature onode release
Ponnuvel Palaniyappan
1970460 at bugs.launchpad.net
Thu Apr 28 18:32:33 UTC 2022
** Description changed:
- The upstream bug is https://tracker.ceph.com/issues/53002
+ [Impact]
- It's been backported to relevant releases upstream (Octopus, Pacific, and Quincy).
- Octopus 15.2.16 has the fix. So does Quincy 17.2.0. However, the latest Pacific release missed out this fix. So needed to be SRU'ed for Pacific only.
+ OSDs crash at randomly due to race condition that can occur
+ at times.
+
+ This was observed when onode's removal is followed by reading
+ and the latter causes object release before the removal is finalized.
+ The root cause is an improper 'pinned' state assessment in Onode::get().
- Master tracker: https://tracker.ceph.com/issues/53002
+ [Test Plan]
+
+ Deploy a ceph cluster and do write some data to the cluster.
+ While performing some reads again from the cluster, no crashes
+ are seen in any OSDs. The race condition can be mimicked
+ by holding one thread (under debugger) while the other one
+ continues to update 'nput' counter.
+
+ [Where problems could occur]
+
+ Despite the new atomic counter it might not be cover cases
+ and still introduce further data race and/or crashes continue
+ to happen.
+
+ [Other Info]
+
+ The upstream bug is https://tracker.ceph.com/issues/53002
+
+ It's been backported to relevant releases upstream (Octopus, Pacific, and
+ Quincy). Octopus 15.2.16 has the fix. So does Quincy 17.2.0. However,
+ the latest Pacific release missed out this fix. So SRU is needed for
+ Pacific (only).
Pacific tracker: https://tracker.ceph.com/issues/53608
Pacific PR: https://github.com/ceph/ceph/pull/44723
** Description changed:
[Impact]
- OSDs crash at randomly due to race condition that can occur
- at times.
-
- This was observed when onode's removal is followed by reading
- and the latter causes object release before the removal is finalized.
- The root cause is an improper 'pinned' state assessment in Onode::get().
+ OSDs crash at randomly due to race condition that can occur
+ at times.
+
+ This was observed when onode's removal is followed by reading
+ and the latter causes object release before the removal is finalized.
+ The root cause is an improper 'pinned' state assessment in Onode::get().
[Test Plan]
- Deploy a ceph cluster and do write some data to the cluster.
- While performing some reads again from the cluster, no crashes
- are seen in any OSDs. The race condition can be mimicked
- by holding one thread (under debugger) while the other one
- continues to update 'nput' counter.
+ Deploy a ceph cluster and do write some data to the cluster.
+ While performing some reads again from the cluster, no crashes
+ are seen in any OSDs. The race condition can be mimicked
+ by holding one thread (under debugger) while the other one
+ continues to update 'nput' counter.
[Where problems could occur]
- Despite the new atomic counter it might not be cover cases
- and still introduce further data race and/or crashes continue
- to happen.
-
+ Despite the new atomic counter it might not be cover cases
+ and still introduce further data race and/or crashes continue
+ to happen.
+
[Other Info]
-
- The upstream bug is https://tracker.ceph.com/issues/53002
It's been backported to relevant releases upstream (Octopus, Pacific, and
Quincy). Octopus 15.2.16 has the fix. So does Quincy 17.2.0. However,
the latest Pacific release missed out this fix. So SRU is needed for
Pacific (only).
+ Master tracker: https://tracker.ceph.com/issues/53002
+
Pacific tracker: https://tracker.ceph.com/issues/53608
Pacific PR: https://github.com/ceph/ceph/pull/44723
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1970460
Title:
[SRU] Avoid premature onode release
Status in ceph package in Ubuntu:
New
Bug description:
[Impact]
OSDs crash at randomly due to race condition that can occur
at times.
This was observed when onode's removal is followed by reading
and the latter causes object release before the removal is finalized.
The root cause is an improper 'pinned' state assessment in Onode::get().
[Test Plan]
Deploy a ceph cluster and do write some data to the cluster.
While performing some reads again from the cluster, no crashes
are seen in any OSDs. The race condition can be mimicked
by holding one thread (under debugger) while the other one
continues to update 'nput' counter.
[Where problems could occur]
Despite the new atomic counter it might not be cover cases
and still introduce further data race and/or crashes continue
to happen.
[Other Info]
It's been backported to relevant releases upstream (Octopus, Pacific, and
Quincy). Octopus 15.2.16 has the fix. So does Quincy 17.2.0. However,
the latest Pacific release missed out this fix. So SRU is needed for
Pacific (only).
Master tracker: https://tracker.ceph.com/issues/53002
Pacific tracker: https://tracker.ceph.com/issues/53608
Pacific PR: https://github.com/ceph/ceph/pull/44723
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1970460/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list