[Bug 1032550] Re: [multipath] failed to get sysfs information

Thu Jan 3 09:47:21 UTC 2013

Peter,

First: happy new year!

I've been doing some more tests to track down the cause of this bug.
Since it looks like a kernel bug, I tried reproducing this with kernel
3.5.0, version 3.5.0-21.32~precise1. I could reproduce the faulty paths
that multipathd was unable to remove, however: there were no hanging
processes this time and thus no kernel crash.. which is an improvement.
During the test I did see this happening:

LUN-DATABASE (36006016061e02e003cf1aca4ae07e211) dm-1 DGC,VRAID
size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- 4:0:1:1 sdi 8:128 active ready running
| `- #:#:#:# -   #:#   active faulty running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 4:0:0:1 sdg 8:96  active ready running
  `- #:#:#:# -   #:#   active faulty running
LUN-LOGGING (36006016061e02e000286c1adae07e211) dm-2 ,
size=20G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- #:#:#:# -   #:#   failed faulty running
| `- 4:0:0:0 sde 8:64  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- #:#:#:# -   #:#   active faulty running
  `- 4:0:1:0 sdh 8:112 active ready running

As you can see, multipathd fails to remove the 'faulty' paths from the
device-mapping again. However, for some reason this didn't lead to
processes stuck in 'D' state this time. During this, the following
message was logged repeatedly:

Jan  3 10:24:14 ealxs00161 multipathd: sdd: failed to get sysfs information
Jan  3 10:24:14 ealxs00161 multipathd: sdd: unusable path

So multipathd was retrying the removal, but it failed every time. After
bringing the path back up, it restored OK and everything was fine again:

LUN-DATABASE (36006016061e02e003cf1aca4ae07e211) dm-1 DGC,VRAID
size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- 4:0:1:1 sdi 8:128 active ready running
| `- 3:0:0:1 sdc 8:32  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 4:0:0:1 sdg 8:96  active ready running
  `- 3:0:1:1 sdf 8:80  active ready running
LUN-LOGGING (36006016061e02e000286c1adae07e211) dm-2 DGC,VRAID
size=20G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- 3:0:1:0 sdd 8:48  active ready running
| `- 4:0:0:0 sde 8:64  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 3:0:0:0 sdb 8:16  active ready running
  `- 4:0:1:0 sdh 8:112 active ready running

After this, failing over again worked just fine, the paths that failed
to be removed the last time were now removed without problems... Both
machines survived about 10 up/down testruns. I'll attach the syslog of
this run shortly.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1032550

Title:
  [multipath]  failed to get sysfs information

Status in “multipath-tools” package in Ubuntu:
  In Progress

Bug description:
  when shutdown switch port of host HBA,  multippath-tool can't get
  correct information  of subpath. by check the "multipath" output,
  some storage device type info disapppear and the failed path always
  stay in path group and don't be clear out.

  mpath2 (3600601601c102900944737e4a73fe011) dm-51 ,
  size=6.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
  |-+- policy='round-robin 0' prio=1 status=active
  | |- #:#:#:#  -    #:#     failed faulty running
  | `- 5:0:2:5  sdcu 70:32   active ready running
  `-+- policy='round-robin 0' prio=0 status=enabled
    |- 5:0:3:5  sdfa 129:192 active ready running
    `- #:#:#:#  -    #:#     failed faulty running
  mpath38 (3600601601c1029008eb6dbe8ae3fe011) dm-59 DGC,VRAID
  size=5.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
  |-+- policy='round-robin 0' prio=1 status=active
  | `- 5:0:2:13 sddf 70:208  active ready running
  `-+- policy='round-robin 0' prio=0 status=enabled
    `- 5:0:3:13 sdfk 130:96  active ready running
  mpath63 (360000970000198700131533030303932) dm-13 EMC,SYMMETRIX
  size=5.6G features='0' hwhandler='0' wp=rw
  `-+- policy='round-robin 0' prio=1 status=active
    |- 5:0:0:8  sdl  8:176   active ready running
    `- 5:0:1:8  sdbd 67:112  active ready running
  mpath95 (360000970000198700131533030323445) dm-43 ,
  size=898M features='0' hwhandler='0' wp=rw
  `-+- policy='round-robin 0' prio=1 status=active
    |- #:#:#:#  -    #:#     failed faulty running
    |- #:#:#:#  -    #:#     failed faulty running
    |- 5:0:0:38 sdas 66:192  active ready running
    `- 5:0:1:38 sdck 69:128  active ready running

  Same time, the syslog show many

  ---------------
  Aug  2 18:25:16 Linux51 multipathd: sdht: failed to get sysfs information
  Aug  2 18:25:16 Linux51 multipathd: sdht: unusable path
  ... ...
  ---------------

  After path  recover, all failed path come back without problem.  there
  is no IP blocked and error happend during fail/recover period.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions