Dynamic osd-devices selection for Ceph charm

Mon Dec 1 10:02:02 UTC 2014

On Sat, Nov 29, 2014 at 11:25 AM, John McEleney <
john.mceleney at netservers.co.uk> wrote:

> Hi all,
>
> I've been working on the Ceph charm with the intention of making it much
> more powerful when it comes to the selection of OSD devices. I wanted to
> knock a few ideas around to see what might be possible.
>
> The main problem I'm trying to address is that with the existing
> implementation, when a new SAS controller is added, or drive caddies get
> swapped around, drive letters (/dev/sd[a-z]) get swapped around. As the
> current charm just asks for a list of devices, and that list of devices
> is global across the entire cluster, it pretty-much requires all
> machines to be identical, and unchanging. I also looked into used
> /dev/disk/by-id, but found this to be too inflexible.

> Below I've pasted a patch I wrote as a stop-gap for myself. This patch
> allows you to list model numbers for your drives instead of /dev/XXXX
> devices. It then dynamically generates the list of /dev/ devices on each
> host. The patch is pretty unsophisticated, but it solves my immediate
> problem. However, I think we can do better than this.

> I've been thinking that xpath strings might be a better way to go. I
> played around with this idea a little. This will give some idea how it
> could work:

> ==========================================
> root at ceph-store1:~# lshw -xml -class disk > /tmp/disk.xml
> root at ceph-store1:~# echo 'cat
> //node[contains(product,"MG03SCA400")]/logicalname/text()'|xmllint --shell
> /tmp/disk.xml|grep '^/dev/'
> /dev/sdc
> /dev/sdd
> /dev/sde
> /dev/sdf
> /dev/sdg
> /dev/sdh
> /dev/sdi
> /dev/sdj
> /dev/sdk
> /dev/sdl
> ==========================================
>
> So, that takes care of selecting by model number. How about selecting
> drives that are larger than 3TB?
>
> ==========================================
> root at ceph-store1:~# echo 'cat
> //node[size>3000000000000]/logicalname/text()'|xmllint --shell
> /tmp/disk.xml|grep '^/dev/'
> /dev/sdc
> /dev/sdd
> /dev/sde
> /dev/sdf
> /dev/sdg
> /dev/sdh
> /dev/sdi
> /dev/sdj
> /dev/sdk
> /dev/sdl
> ==========================================
>
> Just to give some idea of the power of this, take a look at the info
> lshw compiles:
>
>   <node id="disk:3" claimed="true" class="disk"
> handle="GUID:aaaaaaaa-a5c7-4657-924d-8ed94e1b1aaa">
>    <description>SCSI Disk</description>
>    <product>MG03SCA400</product>
>    <vendor>TOSHIBA</vendor>
>    <physid>0.3.0</physid>
>    <businfo>scsi at 1:0.3.0</businfo>
>    <logicalname>/dev/sdf</logicalname>
>    <dev>8:80</dev>
>    <version>DG02</version>
>    <serial>X470A0XXXXXX</serial>
>    <size units="bytes">4000787030016</size>
>    <capacity units="bytes">5334969415680</capacity>
>    <configuration>
>     <setting id="ansiversion" value="6" />
>     <setting id="guid" value="aaaaaaaa-a5c7-4657-924d-8ed94e1b1aaa" />
>     <setting id="sectorsize" value="512" />
>    </configuration>
>    <capabilities>
>     <capability id="7200rpm" >7200 rotations per minute</capability>
>     <capability id="gpt-1.00" >GUID Partition Table version
> 1.00</capability>
>     <capability id="partitioned" >Partitioned disk</capability>
>     <capability id="partitioned:gpt" >GUID partition table</capability>
>    </capabilities>
>   </node>
>
> So, you could be selecting your drives by vendor, size, model, sector
> size, or any combination of these and other attributes.
>
> The only reason I didn't go any further with this idea yet is that "lshw
> -C disk" is incredibly slow. I tried messing around with disabling
> tests, but it still crawls along. I figure that this wouldn't be that
> big a deal if you could cache the resulting xml file, but that's not
> fully satisfactory either. What if I want to hot-plug a new hard-drive
> into the system? lshw would need to be run again. I though that maybe
> udev could be used for doing this, but I certainly don't want udev
> running lshw once per drive at boot time as the drives are detected.
>
> I'm really wondering if anyone else has any advice on either speeding up
> lshw, or if there's any other simple way of pulling this kind of
> functionality off. Maybe I'm worrying too much about this. As long as
> the charm only fires this hook rarely, and caches the data for the
> duration of the hook run, maybe I don't need to worry?
>

i'm wondering if instead of lshw and the time consumption there we could
continue with lsblk, there's a bit more information there (size, model,
rotational) etc which seems to satisfy most of the lshw examples you've
given and is relatively fast in comparison.  ie.
https://gist.github.com/kapilt/d0485d6fac3be6caaed2

another option, here's a script around a similiar use case does a
hierarchical info of drives from controller on down and supports layered
block devs.
http://www.spinics.net/lists/raid/msg34460.html
current implementation @ https://github.com/pturmel/lsdrv/blob/master/lsdrv

cheers,

Kapil

> John
>
> Patch to match against model number (NOT REGRESSION TESTED):
> === modified file 'config.yaml'
> --- config.yaml 2014-10-06 22:07:41 +0000
> +++ config.yaml 2014-11-29 15:42:41 +0000
> @@ -42,16 +42,35 @@
>        These devices are the range of devices that will be checked for and
>        used across all service units.
>        .
> +      This can be a list of devices, or a list of model numbers which will
> +      be used to automatically compile a list of matching devices.
> +      .
>        For ceph >= 0.56.6 these can also be directories instead of devices
> - the
>        charm assumes anything not starting with /dev is a directory
> instead.
> +      Any device not starting with a / is assumed to be a model number
>    osd-journal:
>      type: string
>      default:
>
> === modified file 'hooks/charmhelpers/contrib/storage/linux/utils.py'
> --- hooks/charmhelpers/contrib/storage/linux/utils.py   2014-09-22
> 08:51:15 +0000
> +++ hooks/charmhelpers/contrib/storage/linux/utils.py   2014-11-29
> 15:30:25 +0000
> @@ -1,5 +1,6 @@
>  import os
>  import re
> +import subprocess
>  from stat import S_ISBLK
>
>  from subprocess import (
> @@ -51,3 +52,7 @@
>      if is_partition:
>          return bool(re.search(device + r"\b", out))
>      return bool(re.search(device + r"[0-9]+\b", out))
> +
> +def devices_by_model(model):
> +    proc = subprocess.Popen(['lsblk', '-nio',
> 'KNAME,MODEL'],stdout=subprocess.PIPE)
> +    return [ '/dev/' + dev.split()[0] for dev in [line.strip() for line
> in proc.stdout] if re.search(model+'$',dev) ]
>
> === modified file 'hooks/hooks.py'
> --- hooks/hooks.py      2014-09-30 03:06:10 +0000
> +++ hooks/hooks.py      2014-11-29 15:22:48 +0000
> @@ -44,6 +44,9 @@
>      get_ipv6_addr,
>      format_ipv6_addr
>  )
> +from charmhelpers.contrib.storage.linux.utils import (
> +    devices_by_model
> +)
>
>  from utils import (
>      render_template,
> @@ -166,14 +169,18 @@
>      else:
>          return False
>
> -
>  def get_devices():
>      if config('osd-devices'):
> -        return config('osd-devices').split(' ')
> +        results = []
> +        for dev in config('osd-devices').split(' '):
> +            if dev.startswith('/'):
> +                results.append(dev)
> +            else:
> +                results += devices_by_model(dev)
> +        return results
>      else:
>          return []
>
> -
>  @hooks.hook('mon-relation-joined')
>  def mon_relation_joined():
>      for relid in relation_ids('mon'):
>
>
> --
> -----------------------------
> John McEleney
> Netservers Ltd.
> 21 Signet Court
> Cambridge
> CB5 8LA
> http://www.netservers.co.uk
> -----------------------------
> Tel. 01223 446000
> Fax. 0870 4861970
> -----------------------------
> Registered in England
> Number: 04028770
> -----------------------------
>
>
> --
> Juju mailing list
> Juju at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20141201/f072c862/attachment.html>