[Bug 1825843] Re: systemd issues with bionic-rocky causing nagios alert and can't restart daemon
Pete Vander Giessen
1825843 at bugs.launchpad.net
Tue Apr 23 14:41:29 UTC 2019
After IRC conversations and more testing, I think that I have a clean
reproduction of this bug, along with a root cause.
The root cause: the charm takes control of the radosgw service, and
changes the name, but doesn't remove the old nrpe check.
To reproduce:
1) juju deploy the following bundle: https://paste.ubuntu.com/p/wpVt447Vwz/
2) juju ssh into ceph-radosgw/0 and note that there is a "check_radosgw.cfg" in /etc/nagios/nrpe.d.
3) Trigger the config-changed hooked on the ceph-radosgw charm. You might change the number of ceph replicas, for example.
4) Note that there is now a "check_ceph-radosgw@<hostname>.cfg" check, in addition to the check_radosgw.cfg check.
5) Run both checks (cat the files to get the command). Note that the new, hostname based check succeeds, but the old check does not.
The original check will also fail if you run it during step 2,
suggesting that the service has been changed, but the nagios monitoring
is not updated until the config-changed hook runs.
This bug can be closed once the charm places checks in
/etc/nagios/nrpe.d that accurately reflect the running services, and
cleans up outdated checks as well.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1825843
Title:
systemd issues with bionic-rocky causing nagios alert and can't
restart daemon
Status in OpenStack ceph-radosgw charm:
Triaged
Status in ceph package in Ubuntu:
Invalid
Bug description:
During deployment of a bionic-rocky cloud on 19.04 charms, we are
seeing an issue with the ceph-radosgw units related to the systemd
service definition for radosgw.service.
If you look through this pastebin, you'll notice that there is a
running radosgw daemon and the local haproxy unit thinks all radosgw
backend services are available (via nagios check), but systemd can't
control radosgw properly (note that before a restart with systemd,
systemd just showed the unit as loaded inactive, however, it now shows
active exited, but that did not actually restart the radosgw service.
https://pastebin.ubuntu.com/p/Pn3sQ3zHXx/
charm: cs:ceph-radosgw-266
cloud:bionic-rocky
*** 13.2.4+dfsg1-0ubuntu0.18.10.1~cloud0 500
500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/rocky/main amd64 Packages
ceph-radosgw/0 active idle 18/lxd/2 10.20.175.60 80/tcp Unit is ready
hacluster-radosgw/2 active idle 10.20.175.60 Unit is ready and clustered
ceph-radosgw/1 active idle 19/lxd/2 10.20.175.48 80/tcp Unit is ready
hacluster-radosgw/1 active idle 10.20.175.48 Unit is ready and clustered
ceph-radosgw/2* active idle 20/lxd/2 10.20.175.25 80/tcp Unit is ready
hacluster-radosgw/0* active idle 10.20.175.25 Unit is ready and clustered
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-radosgw/+bug/1825843/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list