[Bug 1020436] Re: Cannot read superblock after FC multipath failover

Tue Jul 24 17:00:55 UTC 2012

It doesn't look like your lvm.conf filter is working.

(pvdisplay)

        /dev/sdb: block size is 4096 bytes
      /dev/sdb: lvm2 label detected
      Ignoring duplicate PV RLOhdDURbD7uK2La3MDK2olkP0BF2Tu7 on /dev/sdb - using dm /dev/mapper/mpath0
        Closed /dev/sdb
        Opened /dev/sdc RO O_DIRECT

the sd names shouldn't even be seen.

from the output of vgscan -vvv you should see something like this:

      /dev/ram15: No label detected
        Closed /dev/ram15
        /dev/sdc: Skipping (regex)
        /dev/sdc1: Skipping (regex)

Change your filter to what's documented in the multipath guide.

https://help.ubuntu.com/12.04/serverguide/multipath-devices.html
#multipath-devices-in-logical-volumes

filter = [ "r/block/", "r/disk/", "r/sd.*/", "a/.*/" ]

Run vgscan -vvv after to verify, those pvdisplay errors should also go
away.

If you want it to be extra quiet, you can filter out ram devices too, here's mine on an hp server
with the following filter.

    filter = [ "r/block/", "r/ram.*/", "r/disk/", "r/cciss/", "r/sd.*/",
"a/.*/" ]

The thing to know about sd names is that they are not deterministic, ever, so to keep sda as
part of your vg set you'll need to determine it's unique udev name and filter that in. Looking
back at your original filter, I can see that it's wrong now.

    # By default we accept every block device:
    filter = [ "a/.*/", "r|/dev/sd[b-z]|" ]

It should be

    # By default we accept every block device:
    filter = [ "a/.*/", "r/sd.[b-z]/" ]

Again, if you hotplug sda, or reverse scan your busses, this ceases filter accurately, use the 
unique udev names.

Beyond that, everything looks good, all block devices are responding,
that are left, after failover.

I tested failover using lvm + xfs using multipath 0.4.8 with both multibus and priority grouping, I
could not reproduce the issue. Load test was iozone -a in a loop. I just realized, you didn't describe
what load these were under during the failure, or how many failovers are required to reach the fault.

You're dmtables look sane.
-0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:64 1000 8:16 1000 8:96 1000 8:160 1000 
+0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 2 1 8:64 1000 8:16 1000 

You can tell that it's all in one group via the single round-robin directive, which marks the beginning
of all paths in that group, clearly you went from 4 -> 2. The fact that the map is updated
at all means multipathd did it's job.  multipathd's main job is the patch checking, and managing the
policy we define for those paths by updating the dm tables as needed. DM then simply operates
on what it has for a table definition and keeps chugging along until that table changes.

back to the original multipathd output you provide DM_STATE=active and DM_SUSPENDED=0, that
map was never deactivated. I tried pushing my queue up to 1000 from 128 for rr_min_io, no change.

If it's not the lvm filter then it's something else very subtle. Unless you're comfortable with upgrading
to multipath 0.4.9 via apt pinning (done this before with lucid, very stable, and better) as I described in
comment #16, the only way we'll get to the bottom of this is if we're hands on; which necessitates an
Ubuntu Advantage agreement.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions