[Bug 1825843] Re: systemd issues with bionic-rocky causing nagios alert and can't restart daemon
Angel Vargas
angelvargas at outlook.es
Tue Apr 23 00:50:28 UTC 2019
We upgraded the charms and radosgw got broken, ceph-radosgw release 267.
After hours of debugging, We decided to test a fresh 4 node deployment to investigate the problem and try to revert, deploying a fresh openstack base juju is showing:
ceph-radosgw/0* blocked idle 0/lxd/0 10.100.0.61 80/tcp Services not running that should be: ceph-radosgw at rgw.juju-168b18-0-lxd-0
if we do a restart to the lxd container, when we execute:
sudo service radosgw status
we get:
Apr 23 00:18:05 juju-168b18-0-lxd-0 radosgw[36885]: Starting client.rgw.juju-168b18-0-lxd-0...
Apr 23 00:18:05 juju-168b18-0-lxd-0 systemd[1]: Started LSB: radosgw RESTful rados gateway.
Apr 23 00:19:22 juju-168b18-0-lxd-0 systemd[1]: Stopping LSB: radosgw RESTful rados gateway...
Apr 23 00:19:22 juju-168b18-0-lxd-0 systemd[1]: Stopped LSB: radosgw RESTful rados gateway.
Apr 23 00:19:26 juju-168b18-0-lxd-0 systemd[1]: radosgw.service: Failed to reset devices.list: Operation not permitted
Apr 23 00:19:26 juju-168b18-0-lxd-0 systemd[1]: Starting LSB: radosgw RESTful rados gateway...
Apr 23 00:19:26 juju-168b18-0-lxd-0 radosgw[37618]: Starting client.rgw.juju-168b18-0-lxd-0...
Apr 23 00:19:26 juju-168b18-0-lxd-0 systemd[1]: Started LSB: radosgw RESTful rados gateway.
Apr 23 00:21:48 juju-168b18-0-lxd-0 systemd[1]: Stopping LSB: radosgw RESTful rados gateway...
Apr 23 00:21:49 juju-168b18-0-lxd-0 systemd[1]: Stopped LSB: radosgw RESTful rados gateway.
that's the output after a fresh boot, then if we do:
sudo service radosgw start
we get the service running:
● radosgw.service - LSB: radosgw RESTful rados gateway
Loaded: loaded (/etc/init.d/radosgw; generated)
Active: active (running) since Tue 2019-04-23 00:22:47 UTC; 17min ago
Docs: man:systemd-sysv-generator(8)
Process: 811 ExecStart=/etc/init.d/radosgw start (code=exited, status=0/SUCCESS)
Tasks: 582 (limit: 7372)
CGroup: /system.slice/radosgw.service
└─850 /usr/bin/radosgw -n client.rgw.juju-168b18-0-lxd-0
Apr 23 00:22:46 juju-168b18-0-lxd-0 systemd[1]: Starting LSB: radosgw RESTful rados gateway...
Apr 23 00:22:46 juju-168b18-0-lxd-0 radosgw[811]: Starting client.rgw.juju-168b18-0-lxd-0...
Apr 23 00:22:47 juju-168b18-0-lxd-0 systemd[1]: Started LSB: radosgw RESTful rados gateway.
but juju still keep showing the unit blocked.
This is the juju log for ceph-radosgw:
https://paste.ubuntu.com/p/kb3g9XZ7nb/
We are getting the same behaviour in our production and test
environment. Even if we get the service running, the unit doesn't seem
to work from the openstack perspective, e.g. try to create a bucket the
api doesn't connect.
How can I help?
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1825843
Title:
systemd issues with bionic-rocky causing nagios alert and can't
restart daemon
Status in OpenStack ceph-radosgw charm:
Triaged
Status in ceph package in Ubuntu:
New
Bug description:
During deployment of a bionic-rocky cloud on 19.04 charms, we are
seeing an issue with the ceph-radosgw units related to the systemd
service definition for radosgw.service.
If you look through this pastebin, you'll notice that there is a
running radosgw daemon and the local haproxy unit thinks all radosgw
backend services are available (via nagios check), but systemd can't
control radosgw properly (note that before a restart with systemd,
systemd just showed the unit as loaded inactive, however, it now shows
active exited, but that did not actually restart the radosgw service.
https://pastebin.ubuntu.com/p/Pn3sQ3zHXx/
charm: cs:ceph-radosgw-266
cloud:bionic-rocky
*** 13.2.4+dfsg1-0ubuntu0.18.10.1~cloud0 500
500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/rocky/main amd64 Packages
ceph-radosgw/0 active idle 18/lxd/2 10.20.175.60 80/tcp Unit is ready
hacluster-radosgw/2 active idle 10.20.175.60 Unit is ready and clustered
ceph-radosgw/1 active idle 19/lxd/2 10.20.175.48 80/tcp Unit is ready
hacluster-radosgw/1 active idle 10.20.175.48 Unit is ready and clustered
ceph-radosgw/2* active idle 20/lxd/2 10.20.175.25 80/tcp Unit is ready
hacluster-radosgw/0* active idle 10.20.175.25 Unit is ready and clustered
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-radosgw/+bug/1825843/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list