Dynamic osd-devices selection for Ceph charm
Kapil Thangavelu
kapil.thangavelu at canonical.com
Mon Dec 1 10:02:02 UTC 2014
On Sat, Nov 29, 2014 at 11:25 AM, John McEleney <
john.mceleney at netservers.co.uk> wrote:
> Hi all,
>
> I've been working on the Ceph charm with the intention of making it much
> more powerful when it comes to the selection of OSD devices. I wanted to
> knock a few ideas around to see what might be possible.
>
> The main problem I'm trying to address is that with the existing
> implementation, when a new SAS controller is added, or drive caddies get
> swapped around, drive letters (/dev/sd[a-z]) get swapped around. As the
> current charm just asks for a list of devices, and that list of devices
> is global across the entire cluster, it pretty-much requires all
> machines to be identical, and unchanging. I also looked into used
> /dev/disk/by-id, but found this to be too inflexible.
> Below I've pasted a patch I wrote as a stop-gap for myself. This patch
> allows you to list model numbers for your drives instead of /dev/XXXX
> devices. It then dynamically generates the list of /dev/ devices on each
> host. The patch is pretty unsophisticated, but it solves my immediate
> problem. However, I think we can do better than this.
> I've been thinking that xpath strings might be a better way to go. I
> played around with this idea a little. This will give some idea how it
> could work:
> ==========================================
> root at ceph-store1:~# lshw -xml -class disk > /tmp/disk.xml
> root at ceph-store1:~# echo 'cat
> //node[contains(product,"MG03SCA400")]/logicalname/text()'|xmllint --shell
> /tmp/disk.xml|grep '^/dev/'
> /dev/sdc
> /dev/sdd
> /dev/sde
> /dev/sdf
> /dev/sdg
> /dev/sdh
> /dev/sdi
> /dev/sdj
> /dev/sdk
> /dev/sdl
> ==========================================
>
> So, that takes care of selecting by model number. How about selecting
> drives that are larger than 3TB?
>
> ==========================================
> root at ceph-store1:~# echo 'cat
> //node[size>3000000000000]/logicalname/text()'|xmllint --shell
> /tmp/disk.xml|grep '^/dev/'
> /dev/sdc
> /dev/sdd
> /dev/sde
> /dev/sdf
> /dev/sdg
> /dev/sdh
> /dev/sdi
> /dev/sdj
> /dev/sdk
> /dev/sdl
> ==========================================
>
> Just to give some idea of the power of this, take a look at the info
> lshw compiles:
>
> <node id="disk:3" claimed="true" class="disk"
> handle="GUID:aaaaaaaa-a5c7-4657-924d-8ed94e1b1aaa">
> <description>SCSI Disk</description>
> <product>MG03SCA400</product>
> <vendor>TOSHIBA</vendor>
> <physid>0.3.0</physid>
> <businfo>scsi at 1:0.3.0</businfo>
> <logicalname>/dev/sdf</logicalname>
> <dev>8:80</dev>
> <version>DG02</version>
> <serial>X470A0XXXXXX</serial>
> <size units="bytes">4000787030016</size>
> <capacity units="bytes">5334969415680</capacity>
> <configuration>
> <setting id="ansiversion" value="6" />
> <setting id="guid" value="aaaaaaaa-a5c7-4657-924d-8ed94e1b1aaa" />
> <setting id="sectorsize" value="512" />
> </configuration>
> <capabilities>
> <capability id="7200rpm" >7200 rotations per minute</capability>
> <capability id="gpt-1.00" >GUID Partition Table version
> 1.00</capability>
> <capability id="partitioned" >Partitioned disk</capability>
> <capability id="partitioned:gpt" >GUID partition table</capability>
> </capabilities>
> </node>
>
> So, you could be selecting your drives by vendor, size, model, sector
> size, or any combination of these and other attributes.
>
> The only reason I didn't go any further with this idea yet is that "lshw
> -C disk" is incredibly slow. I tried messing around with disabling
> tests, but it still crawls along. I figure that this wouldn't be that
> big a deal if you could cache the resulting xml file, but that's not
> fully satisfactory either. What if I want to hot-plug a new hard-drive
> into the system? lshw would need to be run again. I though that maybe
> udev could be used for doing this, but I certainly don't want udev
> running lshw once per drive at boot time as the drives are detected.
>
> I'm really wondering if anyone else has any advice on either speeding up
> lshw, or if there's any other simple way of pulling this kind of
> functionality off. Maybe I'm worrying too much about this. As long as
> the charm only fires this hook rarely, and caches the data for the
> duration of the hook run, maybe I don't need to worry?
>
i'm wondering if instead of lshw and the time consumption there we could
continue with lsblk, there's a bit more information there (size, model,
rotational) etc which seems to satisfy most of the lshw examples you've
given and is relatively fast in comparison. ie.
https://gist.github.com/kapilt/d0485d6fac3be6caaed2
another option, here's a script around a similiar use case does a
hierarchical info of drives from controller on down and supports layered
block devs.
http://www.spinics.net/lists/raid/msg34460.html
current implementation @ https://github.com/pturmel/lsdrv/blob/master/lsdrv
cheers,
Kapil
> John
>
> Patch to match against model number (NOT REGRESSION TESTED):
> === modified file 'config.yaml'
> --- config.yaml 2014-10-06 22:07:41 +0000
> +++ config.yaml 2014-11-29 15:42:41 +0000
> @@ -42,16 +42,35 @@
> These devices are the range of devices that will be checked for and
> used across all service units.
> .
> + This can be a list of devices, or a list of model numbers which will
> + be used to automatically compile a list of matching devices.
> + .
> For ceph >= 0.56.6 these can also be directories instead of devices
> - the
> charm assumes anything not starting with /dev is a directory
> instead.
> + Any device not starting with a / is assumed to be a model number
> osd-journal:
> type: string
> default:
>
> === modified file 'hooks/charmhelpers/contrib/storage/linux/utils.py'
> --- hooks/charmhelpers/contrib/storage/linux/utils.py 2014-09-22
> 08:51:15 +0000
> +++ hooks/charmhelpers/contrib/storage/linux/utils.py 2014-11-29
> 15:30:25 +0000
> @@ -1,5 +1,6 @@
> import os
> import re
> +import subprocess
> from stat import S_ISBLK
>
> from subprocess import (
> @@ -51,3 +52,7 @@
> if is_partition:
> return bool(re.search(device + r"\b", out))
> return bool(re.search(device + r"[0-9]+\b", out))
> +
> +def devices_by_model(model):
> + proc = subprocess.Popen(['lsblk', '-nio',
> 'KNAME,MODEL'],stdout=subprocess.PIPE)
> + return [ '/dev/' + dev.split()[0] for dev in [line.strip() for line
> in proc.stdout] if re.search(model+'$',dev) ]
>
> === modified file 'hooks/hooks.py'
> --- hooks/hooks.py 2014-09-30 03:06:10 +0000
> +++ hooks/hooks.py 2014-11-29 15:22:48 +0000
> @@ -44,6 +44,9 @@
> get_ipv6_addr,
> format_ipv6_addr
> )
> +from charmhelpers.contrib.storage.linux.utils import (
> + devices_by_model
> +)
>
> from utils import (
> render_template,
> @@ -166,14 +169,18 @@
> else:
> return False
>
> -
> def get_devices():
> if config('osd-devices'):
> - return config('osd-devices').split(' ')
> + results = []
> + for dev in config('osd-devices').split(' '):
> + if dev.startswith('/'):
> + results.append(dev)
> + else:
> + results += devices_by_model(dev)
> + return results
> else:
> return []
>
> -
> @hooks.hook('mon-relation-joined')
> def mon_relation_joined():
> for relid in relation_ids('mon'):
>
>
> --
> -----------------------------
> John McEleney
> Netservers Ltd.
> 21 Signet Court
> Cambridge
> CB5 8LA
> http://www.netservers.co.uk
> -----------------------------
> Tel. 01223 446000
> Fax. 0870 4861970
> -----------------------------
> Registered in England
> Number: 04028770
> -----------------------------
>
>
> --
> Juju mailing list
> Juju at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20141201/f072c862/attachment.html>
More information about the Juju
mailing list