[Bug 1032550] Re: [multipath] failed to get sysfs information
Ronald Moesbergen
intercommit at gmail.com
Thu Jan 3 09:47:21 UTC 2013
Peter,
First: happy new year!
I've been doing some more tests to track down the cause of this bug.
Since it looks like a kernel bug, I tried reproducing this with kernel
3.5.0, version 3.5.0-21.32~precise1. I could reproduce the faulty paths
that multipathd was unable to remove, however: there were no hanging
processes this time and thus no kernel crash.. which is an improvement.
During the test I did see this happening:
LUN-DATABASE (36006016061e02e003cf1aca4ae07e211) dm-1 DGC,VRAID
size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- 4:0:1:1 sdi 8:128 active ready running
| `- #:#:#:# - #:# active faulty running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 4:0:0:1 sdg 8:96 active ready running
`- #:#:#:# - #:# active faulty running
LUN-LOGGING (36006016061e02e000286c1adae07e211) dm-2 ,
size=20G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- #:#:#:# - #:# failed faulty running
| `- 4:0:0:0 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- #:#:#:# - #:# active faulty running
`- 4:0:1:0 sdh 8:112 active ready running
As you can see, multipathd fails to remove the 'faulty' paths from the
device-mapping again. However, for some reason this didn't lead to
processes stuck in 'D' state this time. During this, the following
message was logged repeatedly:
Jan 3 10:24:14 ealxs00161 multipathd: sdd: failed to get sysfs information
Jan 3 10:24:14 ealxs00161 multipathd: sdd: unusable path
So multipathd was retrying the removal, but it failed every time. After
bringing the path back up, it restored OK and everything was fine again:
LUN-DATABASE (36006016061e02e003cf1aca4ae07e211) dm-1 DGC,VRAID
size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- 4:0:1:1 sdi 8:128 active ready running
| `- 3:0:0:1 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 4:0:0:1 sdg 8:96 active ready running
`- 3:0:1:1 sdf 8:80 active ready running
LUN-LOGGING (36006016061e02e000286c1adae07e211) dm-2 DGC,VRAID
size=20G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- 3:0:1:0 sdd 8:48 active ready running
| `- 4:0:0:0 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 3:0:0:0 sdb 8:16 active ready running
`- 4:0:1:0 sdh 8:112 active ready running
After this, failing over again worked just fine, the paths that failed
to be removed the last time were now removed without problems... Both
machines survived about 10 up/down testruns. I'll attach the syslog of
this run shortly.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1032550
Title:
[multipath] failed to get sysfs information
Status in “multipath-tools” package in Ubuntu:
In Progress
Bug description:
when shutdown switch port of host HBA, multippath-tool can't get
correct information of subpath. by check the "multipath" output,
some storage device type info disapppear and the failed path always
stay in path group and don't be clear out.
mpath2 (3600601601c102900944737e4a73fe011) dm-51 ,
size=6.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- #:#:#:# - #:# failed faulty running
| `- 5:0:2:5 sdcu 70:32 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 5:0:3:5 sdfa 129:192 active ready running
`- #:#:#:# - #:# failed faulty running
mpath38 (3600601601c1029008eb6dbe8ae3fe011) dm-59 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 5:0:2:13 sddf 70:208 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 5:0:3:13 sdfk 130:96 active ready running
mpath63 (360000970000198700131533030303932) dm-13 EMC,SYMMETRIX
size=5.6G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 5:0:0:8 sdl 8:176 active ready running
`- 5:0:1:8 sdbd 67:112 active ready running
mpath95 (360000970000198700131533030323445) dm-43 ,
size=898M features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- #:#:#:# - #:# failed faulty running
|- #:#:#:# - #:# failed faulty running
|- 5:0:0:38 sdas 66:192 active ready running
`- 5:0:1:38 sdck 69:128 active ready running
Same time, the syslog show many
---------------
Aug 2 18:25:16 Linux51 multipathd: sdht: failed to get sysfs information
Aug 2 18:25:16 Linux51 multipathd: sdht: unusable path
... ...
---------------
After path recover, all failed path come back without problem. there
is no IP blocked and error happend during fail/recover period.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions
More information about the foundations-bugs
mailing list