[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Xav Paice
xav.paice at canonical.com
Wed May 22 23:45:30 UTC 2019
I'm seeing this in a slightly different manner, on Bionic/Queens.
We have LVMs encrypted (thanks Vault), and rebooting a host results in
at least one OSD not returning fairly consistently. The LVs appear in
the list, however the difference between a working and a non-working OSD
is the lack of links to block.db and block.wal on a non-working OSD.
See https://pastebin.canonical.com/p/rW3VgMMkmY/ for some info.
If I made the links manually:
cd /var/lib/ceph/osd/ceph-4
ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.db
ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.wal
This resulted in a perms error accessing the device
"bluestore(/var/lib/ceph/osd/ceph-4) _open_db
/var/lib/ceph/osd/ceph-4/block.db symlink exists but target unusable:
(13) Permission denied"
ls -l /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/
total 0
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-20
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-24
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-14
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-12
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-22
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-18
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-16
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-19
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-23
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-13
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-11
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-21
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-17
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-15
I tried to change the perms to ceph.ceph ownership, but no change.
I have also tried (using `systemctl edit lvm2-monitor.service`) adding
the following to lvm2, but that's not changed the behavior either:
# cat /etc/systemd/system/lvm2-monitor.service.d/override.conf
[Service]
ExecStartPre=/bin/sleep 60
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Status in systemd package in Ubuntu:
Confirmed
Bug description:
Ubuntu 18.04.2 Ceph deployment.
Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices.
LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev.
However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process.
The behaviour can be fixed manually by running "#/sbin/lvm pvscan
--cache --activate ay /dev/nvme0n1" command for re-activating the LVM
components and then the services can be started.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions
More information about the foundations-bugs
mailing list