[Bug 1955345] [NEW] Active ceph-mgr crashes on receiving report from a non-active mgr
Ponnuvel Palaniyappan
1955345 at bugs.launchpad.net
Sun Dec 19 15:27:52 UTC 2021
Public bug reported:
[Impact]
An active ceph-mgr crashes and another ceph-mgr takes over and becomes
the active mgr. But this could again hit same issue and crash and the cycle can continue indefinitely (previously crashed ceph-mgr gets restarted by systemd).
This could affect the cluster stability/usability as ceph mgr handles a
number of essential operations (modules that control/change Ceph cluster
behaviour, metrics, etc).
[Test Plan]
Deploy and operate a Ceph cluster normally.
Increase the log level of mgr to 20.
Observe MMgrReport sent from non-active mgrs get ignored (no crash).
[Where problems could occur]
Possibly the fix may not actually fix and mgr continue to crash as before.
Might incorrectly ignore reports from active mgrs.
[Other Info]
Upstream main bug: https://tracker.ceph.com/issues/48022
Octopus backport PR: https://github.com/ceph/ceph/pull/43861
Octopus backport bug: https://tracker.ceph.com/issues/53198
This has been already been fixed and available in Pacific.
So needed to backport only for Octopus.
** Affects: ceph (Ubuntu)
Importance: High
Assignee: Ponnuvel Palaniyappan (pponnuvel)
Status: In Progress
** Affects: ceph (Ubuntu Focal)
Importance: High
Assignee: Ponnuvel Palaniyappan (pponnuvel)
Status: In Progress
** Tags: sts
** Changed in: ceph (Ubuntu)
Assignee: (unassigned) => Ponnuvel Palaniyappan (pponnuvel)
** Changed in: ceph (Ubuntu)
Status: New => In Progress
** Description changed:
- [Impact]
+ [Impact]
An active ceph-mgr crashes and another ceph-mgr takes over and becomes
- the active mgr. But this could again hit same issue and crash and the cycle
- can continue indefinitely (previously crashed ceph-mgr gets restarted by
- systemd).
+ the active mgr. But this could again hit same issue and crash and the cycle can continue indefinitely (previously crashed ceph-mgr gets restarted by systemd).
- This could affect the cluster stability/usability as ceph mgr handles a number
- of essential operations (modules that control/change Ceph cluster behaviour,
- metrics, etc).
+ This could affect the cluster stability/usability as ceph mgr handles a
+ number of essential operations (modules that control/change Ceph cluster
+ behaviour, metrics, etc).
[Test Plan]
Deploy and operate a Ceph cluster normally.
Increase the log level of mgr to 20.
Observe MMgrReport sent from non-active mgrs get ignored (no crash).
[Where problems could occur]
Possibly the fix may not actually fix and mgr continue to crash as before.
Might incorrectly ignore reports from active mgrs.
[Other Info]
- Upstream main bug: https://tracker.ceph.com/issues/48022
+ Upstream main bug: https://tracker.ceph.com/issues/48022
Octopus backport PR: https://github.com/ceph/ceph/pull/43861
Octopus backport bug: https://tracker.ceph.com/issues/53198
This has been already been fixed and available in Pacific.
So needed to backport only for Octopus.
** Also affects: ceph (Ubuntu Focal)
Importance: Undecided
Status: New
** Changed in: ceph (Ubuntu Focal)
Assignee: (unassigned) => Ponnuvel Palaniyappan (pponnuvel)
** Changed in: ceph (Ubuntu Focal)
Status: New => In Progress
** Changed in: ceph (Ubuntu)
Importance: Undecided => High
** Changed in: ceph (Ubuntu Focal)
Importance: Undecided => High
** Tags added: sts
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1955345
Title:
Active ceph-mgr crashes on receiving report from a non-active mgr
Status in ceph package in Ubuntu:
In Progress
Status in ceph source package in Focal:
In Progress
Bug description:
[Impact]
An active ceph-mgr crashes and another ceph-mgr takes over and becomes
the active mgr. But this could again hit same issue and crash and the cycle can continue indefinitely (previously crashed ceph-mgr gets restarted by systemd).
This could affect the cluster stability/usability as ceph mgr handles
a number of essential operations (modules that control/change Ceph
cluster behaviour, metrics, etc).
[Test Plan]
Deploy and operate a Ceph cluster normally.
Increase the log level of mgr to 20.
Observe MMgrReport sent from non-active mgrs get ignored (no crash).
[Where problems could occur]
Possibly the fix may not actually fix and mgr continue to crash as before.
Might incorrectly ignore reports from active mgrs.
[Other Info]
Upstream main bug: https://tracker.ceph.com/issues/48022
Octopus backport PR: https://github.com/ceph/ceph/pull/43861
Octopus backport bug: https://tracker.ceph.com/issues/53198
This has been already been fixed and available in Pacific.
So needed to backport only for Octopus.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1955345/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list