[Bug 1724529] Re: ceph health output flips between OK and WARN all the time
Billy Olsen
billy.olsen at canonical.com
Thu Nov 30 22:41:14 UTC 2017
In responses specifically to comment #4 and changing the
mon_pg_warn_max_per_osd setting...
I don't think that the mon_pg_warn_max_per_osd setting should be
something that is generally available as a config option on the ceph-mon
charm. The warning was added to Ceph as a direct response of real world
experience from the Ceph devs based on observed behavior during recovery
scenarios. The warning is there to indicate that you are exceeding
recommended thresholds for acceptable recovery scenarios (for example,
read http://lists.ceph.com/pipermail/ceph-users-
ceph.com/2015-January/045780.html as a real-world, albeit extreme,
example).
Granted, its not at all trivial to fix the warning when it appears due
to the inability to reduce the pg_num for a pool. The resolution
inevitably involves creating a new pool and migrating data to it.
Unfortunately, the Ceph community provides no recommended way to do this
and various options that exist all have their drawbacks. Rados cppool
doesn't copy user versions (e.g. user issued snapshots) and doesn't work
for EC pools. Cache tiering migration may work for most use cases, but
there would need to be windows where the clients would need to reconnect
to talk to the right pool (and possibly to free up an in-use object) -
some suggestions are available at http://ceph.com/geen-categorie/ceph-
pool-migration/.
Overall, when an admin is going to change this configuration setting I
think it'd be best if they were to understand what the implications of
the configuration is and to accept the possible downstream
ramifications. It may not be the OSD that gets killed when it starts
gobbling up memory on the box; it could be an innocent bystander.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1724529
Title:
ceph health output flips between OK and WARN all the time
Status in OpenStack ceph-mon charm:
Invalid
Status in ceph package in Ubuntu:
Incomplete
Bug description:
Bellow is an output from ceph health running in one second intervals.
Although number of PGs does not change, output changes from HEALTH-OK
to WARN and back all the time.
https://pastebin.canonical.com/200887/
Command bellow generates a PG distribution per OSD
https://pastebin.canonical.com/200891/
and here is output it created
https://pastebin.canonical.com/200877/
Running on Ubuntu 14.04
charm version stable/17.02
ceph version 10.2.7-0ubuntu0.16.04.1~cloud0
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1724529/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list