[Bug 1909162] Re: cluster log slow request spam
James Page
1909162 at bugs.launchpad.net
Thu Feb 4 15:05:39 UTC 2021
Hirsute is currently blocked as I'm working on an interim release ready
for Ceph Pacific - I'm working through a number of 32 bit related issues
on armhf which is taking some time due to build durations on this
architecture.
I have included the patch for this issue in this work:
https://code.launchpad.net/~ubuntu-server-
dev/ubuntu/+source/ceph/+git/ceph/+ref/ubuntu/pacific-snapshot
commit:
https://git.launchpad.net/~ubuntu-server-
dev/ubuntu/+source/ceph/commit/?id=295acf66219bcfd7a2059b5d47a6b5120d23db2e
It would be good if we could move forward with the SRU's for focal and
groovy prior to this landing into the development release.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1909162
Title:
cluster log slow request spam
Status in Ubuntu Cloud Archive:
In Progress
Status in Ubuntu Cloud Archive train series:
In Progress
Status in Ubuntu Cloud Archive ussuri series:
In Progress
Status in ceph package in Ubuntu:
In Progress
Status in ceph source package in Focal:
In Progress
Status in ceph source package in Groovy:
In Progress
Status in ceph source package in Hirsute:
In Progress
Bug description:
[Impact]
A recent change (issue#43975 [0]) was made to slow request logging to
include detail on each operation in the cluster logs. With this
change, detail for every slow request is always sent to the monitors
and added to the cluster logs.
This does not scale. Large, high-throughput clusters can overwhelm
their monitors with spurious logs in the event of a performance issue.
Disrupting the monitors can then cause further instability in the
cluster.
This SRU reverts the cluster logging of every slow request the osd is
processing.
The slow request clog change was added in nautilus (14.2.10) and
octopus (15.2.0).
[Test Case]
Stress the cluster with a benchmarking tool to generate slow requests
and observe the cluster logs.
[Where problems could occur]
The cluster logs contain detailed debug information on slow requests
that is useful for smaller, low-throughput clusters. While these logs
are not used by ceph, they may be used by the cluster administrators
(for monitoring or alerts). Changing this logging behavior may be
unexpected.
[Other Info]
The intent is to re-enable this feature behind a configurable setting,
but the solution must be discussed upstream.
The same slow request detail can be enabled for each osd by raising
the "debug osd" log level to 20.
[0] https://tracker.ceph.com/issues/43975
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1909162/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list