[Bug 1647067] Re: Dangling UDEV links after removing FC LUNs
wondra
wondra at volny.cz
Mon Dec 5 11:54:28 UTC 2016
My feeling is that it is a race condition inside udev. There is a lot of
multipath devices on the system and I'm sometimes seeing messages that
udev is reaching its max children limit of 40.
The multipath -l <wwid> call issues a lot of change events to udev.
OpenStack calls it and immediately begins deleting the scsi devices that
it gets in the reply. Udev processes the messages in a multithreaded way
and the change event arrives after the delete event. It is possible?
OpenStack does this and the device sdbc is the one affected by the bug:
016-12-05 10:38:40.646 500 INFO nova.compute.manager [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b82c - - -]
[instance: a25caee0-2c74-4f91-a87b-e872cbd237e5] Detach volume 2992050a-d40d-4a6a-baa9-1a8f8624c838 from mountpoint /dev/vdb
2016-12-05 10:38:41.008 500 DEBUG oslo_concurrency.lockutils [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b82c
- - -] Lock "connect_volume" acquired by "disconnect_volume" :: waited 0.000s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:444
2016-12-05 10:38:41.009 500 DEBUG oslo_concurrency.processutils [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b
82c - - -] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf multipath -l 360002ac00000000000006b3800014ee8 execute /usr/lib/python2.7/dist-packages/
oslo_concurrency/processutils.py:223
2016-12-05 10:38:41.333 500 DEBUG oslo_concurrency.processutils [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b
82c - - -] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf multipath -l 360002ac00000000000006b3800014ee8" returned: 0 in 0.324s execute /usr/lib/python2.7/dist-package
s/oslo_concurrency/processutils.py:254
2016-12-05 10:38:41.334 500 DEBUG nova.storage.linuxscsi [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b82c - -
-] Found multipath device = /dev/mapper/360002ac00000000000006b3800014ee8 find_multipath_device /usr/lib/python2.7/dist-packages/nova/storage/linuxscsi.py:135
2016-12-05 10:38:41.334 500 DEBUG nova.virt.libvirt.volume [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b82c -
- -] devices to remove = [{'device': '/dev/sdbc', 'host': '1', 'id': '1', 'channel': '0', 'lun': '11'}, {'device': '/dev/sdbb', 'host': '1', 'id': '0', 'channel': '0',
'lun': '11'}, {'device': '/dev/sdbg', 'host': '2', 'id': '1', 'channel': '0', 'lun': '11'}, {'device': '/dev/sdbd', 'host': '2', 'id': '0', 'channel': '0', 'lun': '11'
}] disconnect_volume /usr/lib/python2.7/dist-packages/nova/virt/libvirt/volume.py:1403
2016-12-05 10:38:41.335 500 DEBUG nova.storage.linuxscsi [-] Trying (1) to remove device /dev/sdbc _wait_for_remove /usr/lib/python2.7/dist-packages/nova/storage/linuxs
csi.py:77
2016-12-05 10:38:41.335 500 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf tee -a /sys/bus/scsi/drivers/sd
/1:0:1:11/delete execute /usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:223
2016-12-05 10:38:41.455 500 DEBUG oslo_concurrency.processutils [-] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf tee -a /sys/bus/scsi/drivers/sd/1:0:1:11/delete" ret
urned: 0 in 0.119s execute /usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:254
2016-12-05 10:38:41.456 500 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf sginfo -r execute /usr/lib/pyth
on2.7/dist-packages/oslo_concurrency/processutils.py:223
2016-12-05 10:38:41.543 500 DEBUG oslo_concurrency.processutils [-] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf sginfo -r" returned: 0 in 0.087s execute /usr/lib/py
thon2.7/dist-packages/oslo_concurrency/processutils.py:254
... and the same for the other 3 devices
Udev log, which is pretty unreadable because of all the multithreading, shows this:
found 'b67:96' claiming '/run/udev/links/\x2fdisk\x2fby-path\x2fpci-0000:05:00.0-fc-0x21120002ac014ee8-lun-11'
creating link '/dev/disk/by-path/pci-0000:05:00.0-fc-0x21120002ac014ee8-lun-11' to '/dev/sdbc'
preserve already existing symlink '/dev/disk/by-path/pci-0000:05:00.0-fc-0x21120002ac014ee8-lun-11' to '../../sdbc'
...
handling device node '/dev/sdbc', devnum=b67:96, mode=0600, uid=0, gid=0
can not stat() node '/dev/sdbc' (No such file or directory)
created db file '/run/udev/data/b67:96' for '/devices/pci0000:00/0000:00:02.0/0000:05:00.0/host1/rport-1:0-1/target1:0:1/1:0:1:11/block/sdbc'
adding watch on '/dev/sdbc'
inotify_add_watch(6, /dev/sdbc, 10) failed: No such file or directory
created db file '/run/udev/data/b67:96' for '/devices/pci0000:00/0000:00:02.0/0000:05:00.0/host1/rport-1:0-1/target1:0:1/1:0:1:11/block/sdbc'
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to udev in Ubuntu.
https://bugs.launchpad.net/bugs/1647067
Title:
Dangling UDEV links after removing FC LUNs
Status in udev package in Ubuntu:
Confirmed
Bug description:
We're using Q-Logic QLE2562 Fibre Channel adapters (qla2xxx driver)
against a HPE 3PAR 7400c storage array in an OpenStack environment.
The OpenStack 3PAR driver manages volume attachments from the array to
the servers. There is 4 path multipath to every volume.
As the LUNs are removed, sometimes udev does not remove all links,
particularly in /run/udev/links and /dev/disk/by-path. The symptoms
are multiple records in one by-path directory under /run/udev/links,
broken links to no longer attached luns in dev/disk/by-path and links
between wrong LUNs and scsi devices there.
OpenStack relies on these links. When another volume is attached using
a LUN that has these leftover links and it happens that it is the
first of the 4 paths, OpenStack incorrectly identifies the volume and
attaches the same volume to multiple instances, leading to data loss.
What could be causing this behavior?
Ubuntu version 14.04
Linux version Ubuntu 4.4.0-47.68~14.04.1-generic 4.4.24
udev 204-5ubuntu20.19
---
ApportVersion: 2.14.1-0ubuntu3.21
Architecture: amd64
CustomUdevRuleFiles: 20-3par-unmap.rulez
DistroRelease: Ubuntu 14.04
InstallationDate: Installed on 2015-10-01 (429 days ago)
InstallationMedia: Ubuntu-Server 14.04.3 LTS "Trusty Tahr" - Beta amd64 (20150805)
MachineType: HP ProLiant DL380 Gen9
Package: udev 204-5ubuntu20.19
PackageArchitecture: amd64
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-47-generic root=/dev/mapper/hostname--vg-root ro
ProcVersionSignature: Ubuntu 4.4.0-47.68~14.04.1-generic 4.4.24
Tags: trusty
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
Uname: Linux 4.4.0-47-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 11/03/2014
dmi.bios.vendor: HP
dmi.bios.version: P89
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrP89:bd11/03/2014:svnHP:pnProLiantDL380Gen9:pvr:cvnHP:ct23:cvr:
dmi.product.name: ProLiant DL380 Gen9
dmi.sys.vendor: HP
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/udev/+bug/1647067/+subscriptions
More information about the foundations-bugs
mailing list