[Bug 1540407] Comment bridged from LTC Bugzilla
bugproxy
bugproxy at us.ibm.com
Mon Feb 29 11:30:06 UTC 2016
------- Comment From thorsten.diehl at de.ibm.com 2016-02-29 06:27 EDT-------
(In reply to comment #16)
> > Could you attempt to use the Debian package from Sid on top of the Ubuntu
> system? This will help rule out kernel related issues vs. userspace package
> bugs.
>
> You can download from here
>
> http://ftp.us.debian.org/debian/pool/main/m/multipath-tools/multipath-
> tools_0.5.0+git1.656f8865-4_s390x.deb
>
> And then force install with
>
> sudo dpkg --force all --install
> multipath-tools_0.5.0+git1.656f8865-4_s390x.deb
>
> If this is successful, then I can prepare a package with some updated fixes
> from upstream; there are a number of changes to path related discovery in
> upstream multipath that aren't yet included in the Ubuntu version.
Hi Ryan,
I followed your proposal, installed multipath-tools_0.5.0+git1.656f8865-4_s390x.deb as described, and it worked fine and as expected, i.e. described problem does not occur.
On another system I installed both multipath-tools_0.5.0+git1.656f8865-4_s390x.deb and kpartx_0.5.0+git1.656f8865-4_s390x.deb - with the same result.
I did a detach/attach of the zfcp device and found, that it apperas for some seconds in status active faulty running, then it changed to failed faulty running and remained in that state until reattachment of the device. Then it changed back to active ready running.
Hi Christian,
in addition I did the following tests:
1. On a freshly installed xenial (installer version 427, kernel 4.4.0-8) with the already mentioned multipath.conf I found, that no zfcphbaapi stuff is installed per default.
kernel 4.4.0-8
multipath-tools 0.5.0-7ubuntu15 (was 7ubuntu14 before) and kpartx 0.5.0-7ubuntu15
I did a detach/attach of the zfcp device and found, that it apperas for some seconds in status active faulty offline, then it changed to failed faulty offline and vanished and did not reappear upon reattachment of the device. I had the zfcp LUNs configure manually (via sysfs)
2. Then I installed zfcp-hbaapi-utils_2.1.1-0ubuntu1_s390x.deb and
libzfcphbaapi0_2.1.1-0ubuntu1_s390x.deb on top. With these two
additional packages I could not reproduce this problem.
3. I installed a system with kernel 4.4.0-7 and multipath-tools
0.5.0-7ubuntu14 + the above mentioned zfcphbaapi stuff. Problem was
reproducable.
4. I updated that system to multipath-tools 0.5.0-7ubuntu15. Problem was
still reproducable.
5. I updated that system to kernel 4.4.0-8. It should be now similar to
case #2. But problem was still reproducable. strange.
6. I switched several times back and forth between multipath-tools
0.5.0-7ubuntu15 and multipath-tools_0.5.0+git1.656f8865-4_s390x.deb.
With git1 it worked always, with 7ubuntu15 never. (although it did on a
freshly installed system, see #2)
So, for now we have two ways to solve/circumvent this problem:
a) use an updated multipath-tools package (from debian sid)
b) use an updated multipath-tools/kpartx packages from ubuntu (>= version 0.5.0-7ubuntu15) AND zfcp-hbaapi-utils_2.1.1-0ubuntu1_s390x.deb and libzfcphbaapi0_2.1.1-0ubuntu1_s390x.deb on top. But that didn't help always (see case #5 above)
I found different values for path_selector in multipath-tools versions of ubuntu and debian:
ubuntu15: round-robin 0
debian-git: service-time 0
Then I set this in multipath.conf to the (usual) default "round-robin 0"
And with debian-git version the problem was still solved. So this setting should not matter.
Ryan,
although obviously only fixes from multipath-tools_0.5.0+git1.656f8865-4 are required to fix this problem, please consider and analyze, if it is better to get the complete set of patches in both multipath-tools AND kpartx from debian upstream (sid) instead of cherry picking some fixes.
The debian sid combination of multipath-tools AND kpartx does obviously not harm and is already 6 weeks old.
Anyway, I will do some other test with this combination.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1540407
Title:
multipathd drops paths of a temporarily lost device
Status in multipath-tools package in Ubuntu:
New
Bug description:
== Comment: #0 - Thorsten Diehl <thorsten.diehl at de.ibm.com> - 2016-02-01 08:57:28 ==
# uname -a
Linux s83lp31 4.4.0-1-generic #15-Ubuntu SMP Thu Jan 21 22:19:04 UTC 2016 s390x s390x s390x GNU/Linux
# dpkg -s multipath-tools|grep ^Version:
Version: 0.5.0-7ubuntu9
# cat /etc/multipath.conf
defaults {
default_features "1 queue_if_no_path"
user_friendly_names yes
path_grouping_policy multibus
dev_loss_tmo 2147483647
fast_io_fail_tmo 5
}
blacklist {
devnode '*'
}
blacklist_exceptions {
devnode "^sd[a-z]+"
}
---------------------------------------
On a z Systems LPAR with a single LUN, 2 zfcp devices, 2 storage ports, and the following multipath topology:
mpatha (36005076304ffc3e80000000000003050) dm-0 IBM,2107900
size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 0:0:0:1079001136 sda 8:0 active ready running
|- 0:0:1:1079001136 sdb 8:16 active ready running
|- 1:0:0:1079001136 sdc 8:32 active ready running
`- 1:0:1:1079001136 sdd 8:48 active ready running
I observed the following:
When I deconfigure one of the two zfcp devices (e.g. via chchp -c 0, or directly on the HMC), the multipathd removes the two paths via these devices from the pathgroup after 10 seconds. When the zfcp devices comes back, it runs through zfcp error recovery and is being set up properly, and also the mid layer objects are looking fine. However, the multipathd does not add them to the path group again.
Expected behaviour: multipathd does not remove the paths from topology
list, but holds them as "failed faulty offline" until dev_loss_tmo
timout is reached (which is infinite here).
I discussed this already with zfcp development, and it looks most
likely as a problem with multipathd, rather than zfcp or mid-layer.
Easy to reproduce: you need two zfcp devices, one LUN, and preferably
two ports on the storage server (WWPNs). Configure LUN via 2 zfcp
devices * 2 WWPNs = 4 paths.
This can be also reproduced on a z/VM guest. Instead of configuing the
CHPID off, just detach one zfcp device and re-attach it after 30....60
seconds. Same problem.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1540407/+subscriptions
More information about the foundations-bugs
mailing list