[Bug 1647067] Re: Dangling UDEV links after removing FC LUNs

wondra wondra at volny.cz
Mon Dec 5 11:54:28 UTC 2016


My feeling is that it is a race condition inside udev. There is a lot of
multipath devices on the system and I'm sometimes seeing messages that
udev is reaching its max children limit of 40.

The multipath -l <wwid> call issues a lot of change events to udev.
OpenStack calls it and immediately begins deleting the scsi devices that
it gets in the reply. Udev processes the messages in a multithreaded way
and the change event arrives after the delete event. It is possible?

OpenStack does this and the device sdbc is the one affected by the bug:

016-12-05 10:38:40.646 500 INFO nova.compute.manager [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b82c - - -]
 [instance: a25caee0-2c74-4f91-a87b-e872cbd237e5] Detach volume 2992050a-d40d-4a6a-baa9-1a8f8624c838 from mountpoint /dev/vdb
2016-12-05 10:38:41.008 500 DEBUG oslo_concurrency.lockutils [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b82c
 - - -] Lock "connect_volume" acquired by "disconnect_volume" :: waited 0.000s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:444
2016-12-05 10:38:41.009 500 DEBUG oslo_concurrency.processutils [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b
82c - - -] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf multipath -l 360002ac00000000000006b3800014ee8 execute /usr/lib/python2.7/dist-packages/
oslo_concurrency/processutils.py:223
2016-12-05 10:38:41.333 500 DEBUG oslo_concurrency.processutils [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b
82c - - -] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf multipath -l 360002ac00000000000006b3800014ee8" returned: 0 in 0.324s execute /usr/lib/python2.7/dist-package
s/oslo_concurrency/processutils.py:254
2016-12-05 10:38:41.334 500 DEBUG nova.storage.linuxscsi [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b82c - -
 -] Found multipath device = /dev/mapper/360002ac00000000000006b3800014ee8 find_multipath_device /usr/lib/python2.7/dist-packages/nova/storage/linuxscsi.py:135
2016-12-05 10:38:41.334 500 DEBUG nova.virt.libvirt.volume [req-f5e6d9ff-1a0e-4455-8f60-0f9baea9d413 35c0bc8aaf3e43cea3f265b89c1216ee e18099f000534ae89b8a978ec8e9b82c -
 - -] devices to remove = [{'device': '/dev/sdbc', 'host': '1', 'id': '1', 'channel': '0', 'lun': '11'}, {'device': '/dev/sdbb', 'host': '1', 'id': '0', 'channel': '0',
 'lun': '11'}, {'device': '/dev/sdbg', 'host': '2', 'id': '1', 'channel': '0', 'lun': '11'}, {'device': '/dev/sdbd', 'host': '2', 'id': '0', 'channel': '0', 'lun': '11'
}] disconnect_volume /usr/lib/python2.7/dist-packages/nova/virt/libvirt/volume.py:1403
2016-12-05 10:38:41.335 500 DEBUG nova.storage.linuxscsi [-] Trying (1) to remove device /dev/sdbc _wait_for_remove /usr/lib/python2.7/dist-packages/nova/storage/linuxs
csi.py:77
2016-12-05 10:38:41.335 500 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf tee -a /sys/bus/scsi/drivers/sd
/1:0:1:11/delete execute /usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:223
2016-12-05 10:38:41.455 500 DEBUG oslo_concurrency.processutils [-] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf tee -a /sys/bus/scsi/drivers/sd/1:0:1:11/delete" ret
urned: 0 in 0.119s execute /usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:254
2016-12-05 10:38:41.456 500 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf sginfo -r execute /usr/lib/pyth
on2.7/dist-packages/oslo_concurrency/processutils.py:223
2016-12-05 10:38:41.543 500 DEBUG oslo_concurrency.processutils [-] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf sginfo -r" returned: 0 in 0.087s execute /usr/lib/py
thon2.7/dist-packages/oslo_concurrency/processutils.py:254

... and the same for the other 3 devices

Udev log, which is pretty unreadable because of all the multithreading, shows this:
found 'b67:96' claiming '/run/udev/links/\x2fdisk\x2fby-path\x2fpci-0000:05:00.0-fc-0x21120002ac014ee8-lun-11'
creating link '/dev/disk/by-path/pci-0000:05:00.0-fc-0x21120002ac014ee8-lun-11' to '/dev/sdbc'
preserve already existing symlink '/dev/disk/by-path/pci-0000:05:00.0-fc-0x21120002ac014ee8-lun-11' to '../../sdbc'
...
handling device node '/dev/sdbc', devnum=b67:96, mode=0600, uid=0, gid=0
can not stat() node '/dev/sdbc' (No such file or directory)
created db file '/run/udev/data/b67:96' for '/devices/pci0000:00/0000:00:02.0/0000:05:00.0/host1/rport-1:0-1/target1:0:1/1:0:1:11/block/sdbc'
adding watch on '/dev/sdbc'
inotify_add_watch(6, /dev/sdbc, 10) failed: No such file or directory
created db file '/run/udev/data/b67:96' for '/devices/pci0000:00/0000:00:02.0/0000:05:00.0/host1/rport-1:0-1/target1:0:1/1:0:1:11/block/sdbc'

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to udev in Ubuntu.
https://bugs.launchpad.net/bugs/1647067

Title:
  Dangling UDEV links after removing FC LUNs

Status in udev package in Ubuntu:
  Confirmed

Bug description:
  We're using Q-Logic QLE2562 Fibre Channel adapters (qla2xxx driver)
  against a HPE 3PAR 7400c storage array in an OpenStack environment.
  The OpenStack 3PAR driver manages volume attachments from the array to
  the servers. There is 4 path multipath to every volume.

  As the LUNs are removed, sometimes udev does not remove all links,
  particularly in /run/udev/links and /dev/disk/by-path. The symptoms
  are multiple records in one by-path directory under /run/udev/links,
  broken links to no longer attached luns in dev/disk/by-path and links
  between wrong LUNs and scsi devices there.

  OpenStack relies on these links. When another volume is attached using
  a LUN that has these leftover links and it happens that it is the
  first of the 4 paths, OpenStack incorrectly identifies the volume and
  attaches the same volume to multiple instances, leading to data loss.

  What could be causing this behavior?

  Ubuntu version 14.04 
  Linux version Ubuntu 4.4.0-47.68~14.04.1-generic 4.4.24
  udev 204-5ubuntu20.19
  --- 
  ApportVersion: 2.14.1-0ubuntu3.21
  Architecture: amd64
  CustomUdevRuleFiles: 20-3par-unmap.rulez
  DistroRelease: Ubuntu 14.04
  InstallationDate: Installed on 2015-10-01 (429 days ago)
  InstallationMedia: Ubuntu-Server 14.04.3 LTS "Trusty Tahr" - Beta amd64 (20150805)
  MachineType: HP ProLiant DL380 Gen9
  Package: udev 204-5ubuntu20.19
  PackageArchitecture: amd64
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-47-generic root=/dev/mapper/hostname--vg-root ro
  ProcVersionSignature: Ubuntu 4.4.0-47.68~14.04.1-generic 4.4.24
  Tags:  trusty
  UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
  Uname: Linux 4.4.0-47-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/03/2014
  dmi.bios.vendor: HP
  dmi.bios.version: P89
  dmi.chassis.type: 23
  dmi.chassis.vendor: HP
  dmi.modalias: dmi:bvnHP:bvrP89:bd11/03/2014:svnHP:pnProLiantDL380Gen9:pvr:cvnHP:ct23:cvr:
  dmi.product.name: ProLiant DL380 Gen9
  dmi.sys.vendor: HP

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/udev/+bug/1647067/+subscriptions



More information about the foundations-bugs mailing list