[Bug 1925211] Re: Hot-unplug of disks leaves broken block devices around in Hirsute on s390x
Christian Ehrhardt
1925211 at bugs.launchpad.net
Wed Apr 21 05:44:06 UTC 2021
Hi Kaihenfeng,
Thanks for your patch suggestion! I'm semantically not sure it is the right thing - to clarify your theory is that before it checked !resuming and before had the check for !cdev maybe just to avoid a deference error. And now you assume that instead of !cdev it should check if there is a cdev there.
I'm unsure - if !cdev was indeed just to protect the dereference then maybe no check at all might be better. Which would then read "if the event is IO_SCH_ORPH_UNREG or IO_SCH_UNREG then do css_sch_device_unregister.
But that I'm not immediately convinced doesn't mean much and it is easy
to test and surely worth a try, so I ran v5.11 (bad) plus your patch and
the result will be useful to know in any case. It is working fine, that
much I can tell you.
But if my thought above was right (it was only there to avoid the potential deference error), then why check it at all. If the condition cdev==NULL is possible it would now skip to to fully remove it - we might not need that at all.
And Since I brought up the idea of dropping the cdev check entirely that was worth a try as well. So now the third check of this morning is for:
--- a/drivers/s390/cio/device.c
+++ b/drivers/s390/cio/device.c
@@ -1525,8 +1525,7 @@ static int io_subchannel_sch_event(struct subchannel *sch, int process)
switch (action) {
case IO_SCH_ORPH_UNREG:
case IO_SCH_UNREG:
- if (!cdev)
- css_sch_device_unregister(sch);
+ css_sch_device_unregister(sch);
break;
case IO_SCH_ORPH_ATTACH:
case IO_SCH_UNREG_ATTACH:
My patch with that change - in my test - is working as well.
Neither of the solutions has triggered other regressions in my setup - but then there are so many potential use-cases that I can't be sure without a further revew by subject matter experts.
So a summary of the recent tests:
5.11.0-16-generic #17+lp1925211v202104201520 (Seths full revert) - working
5.11.0lp1925211-patch-kaihengfeng-dirty - working
5.11.0nocdevcheck-paelzer-dirty - working
I think we'd want an answer from the IBM devs which solution (full
revert, kaihenfeng patch, cpaelzer patch, another approach) they would
prefer - then we can submit it upstream for them to include officially
and we can carry it as delta until we rebase onto a version that has it
applied anyway.
[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8cc0dcfdc1c0e0be107d0288f9c0cf1f4201be62
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1925211
Title:
Hot-unplug of disks leaves broken block devices around in Hirsute on
s390x
Status in Ubuntu on IBM z Systems:
New
Status in linux package in Ubuntu:
Confirmed
Status in systemd package in Ubuntu:
New
Status in udev package in Ubuntu:
New
Status in linux source package in Hirsute:
Confirmed
Status in systemd source package in Hirsute:
New
Status in udev source package in Hirsute:
New
Bug description:
Repro:
#1 Get a guest
$ uvt-kvm create --disk 5 --password=ubuntu h release=hirsute arch=s390x label=daily
$ uvt-kvm wait h release=hirsute arch=s390x label=daily
#2 Attach and Detach disk
$ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/test.qcow2 10M
$ virsh attach-disk h /var/lib/libvirt/images/test.qcow2 vdc
$ virsh detach-disk h vdc
From libvirts POV it is gone at this point
$ virsh domblklist h
Target Source
------------------------------------------------------------------
vda /var/lib/uvtool/libvirt/images/hirsute-2nd-zfs.qcow
vdb /var/lib/uvtool/libvirt/images/hirsute-2nd-zfs-ds.qcow
But the guest thinks still it is present
$ uvt-kvm ssh --insecure hirsute-2nd-zfs lsblk
...
vdc 252:32 0 20M 0 disk
This even remains a while after (not a race).
Any access to it in the guest will hang (as you'd expect of a non-existing blockdev)
4 0 1758 1739 20 0 12140 4800 - S+ pts/0 0:00 | \_ sudo mkfs.ext4 /dev/vdc
4 0 1759 1758 20 0 6924 1044 - D+ pts/0 0:00 | \_ mkfs.ext4 /dev/vdc
The result above was originally found with hirsute-guest at hirsute-host
on s390x
I do NOT see the same with groovy-guest at hirsute-host on s390x
I DO see the same with hirsute-guest at groovy-host on s390x
=> Guest version dependent not Host/Hipervisor dependent
I DO see the same with ZFS disks AND LVM disks being added&removed
=> not type dependent
I do NOT see the same on x86.
=> Arch dependent ??
... the evidence slowly points towards an issue in the guest, damn we are so
close to release - but non-fully detaching disks are critical in my POV :-/
Filing this as-is for awareness, but certainly this will need more debugging.
Unsure where this is going to eventually I'll now file it for kernel/udev/systemd.
If there are any known issues/components that are related let me know please!
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu65
Architecture: s390x
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
CRDA: N/A
CasperMD5CheckResult: unknown
DistroRelease: Ubuntu 21.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci:
Lspci-vt: -[0000:00]-
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t: Error: command ['lsusb', '-t'] failed with exit code 1: /sys/bus/usb/devices: No such file or directory
Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
Package: udev
PackageArchitecture: s390x
PciMultimedia:
ProcFB:
ProcKernelCmdLine: root=LABEL=cloudimg-rootfs
ProcVersionSignature: User Name 5.11.0-14.15-generic 5.11.12
RelatedPackageVersions:
linux-restricted-modules-5.11.0-14-generic N/A
linux-backports-modules-5.11.0-14-generic N/A
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: hirsute uec-images
Uname: Linux 5.11.0-14-generic s390x
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: True
acpidump:
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1925211/+subscriptions
More information about the foundations-bugs
mailing list