preliminary machine placement discussion

Tue Nov 8 15:51:56 UTC 2011

Hi all

First of all, and most importantly: the machine placement feature is
definitely *not* confirmed for 12.04. We'd really like to fit it in, but
we can't guarantee anything; what we can do, at least, is try to open up
the discussion a little so we have a chance of figuring out a generally
useful set of minimal requirements.

I've made a start on a something a little more like a real spec, but for
now I'd appreciate a sanity check on my interpretation, hereunder, of
some of our recent design discussions.

Generic constraints
===================

To start off uncontroversially, I think there is general agreement that
the set of machine properties exposed by *all* providers is pretty
small: we can be reasonably sure that we can know about a provider
machine's CPU, memory, and storage, and that's it [0].

Given this, it's pretty easy to imagine a really minimal vocabulary for
describing machines; something like the following (note, actual syntax
may vary):

  juju deploy nova-compute --constraint cores=128,ram=16G

Notice that the deploy command doesn't place any constraints on the
machine's available storage; an unset "storage" constraint, for example,
will be taken to mean "at least 0 bytes of storage".

However, there's already a bit of a problem here, and it's actually
quite a significant one: what exactly does the constraint apply to? The
service, or just the unit?

If that's only a unit-level setting, I think our users will quite
reasonably come to hate us: the last thing they want is to specify the
same constraints every time they add-unit, but if they ever forget
they'll end up deploying nova-compute to a bunch of m1.smalls.
Therefore, I think, it *must* be a service-level setting, despite the
additional hassle we'll encounter.

However, users *will* sometimes want to deploy additional units of a
service to machines different to the first unit, so machine constraints
cannot be settable *only* at the service level. However, we immediately
have a potential ambiguity of expression when a user comes to do the
following:

  juju add-unit nova-compute --constraint cores=64

Our choices are to shadow the service-level setting (so that there's no
RAM constraint on the new machine), or to combine them in a "helpful"
way, such that the "ram=16G" constraint is kept while the "cores=128"
constraint is overwritten.

I do not know what the right answer is here, but I know it's an
important distinction, and we should do our best to pick the least
potentially annoying option; especially since this feature will almost
certainly end up allowing for constraints at the environment level [1].

However, I'd like to quietly acknowledge this problem and move on for
now. We'll need to solve it regardless, and the problem of *how* we
specify is subordinate to the really meaty problem of *what* we specify.

Provider-specific constraints
=============================

The sad fact is that the generic constraints defined above are not
sufficient to solve the placement problem to everyone's satisfaction.
Scenarios that cannot be captured include:

* I want haproxy in rack c7 of my datacentre, because that's got a
really fast connection out.
* I'm on a budget, and I need to deploy on m1.smalls rather than
m1.mediums, and this constraint actually has *nothing* to do with the
actual machine resources that would be ideal for my task.
* I want swift to run on some machine which has 8TB of storage *in some
specific RAID configuration*.
* I'm just playing, and I'm happy with a proof of concept deployed on
t1.micros, even if they do just stop working quite often. 
* I want to deploy this mongodb unit in EC2 availability zone B.

I think that all these constraints can be expressed with a single
mechanism: a provider-derived machine "class".

* On EC2, the available classes are published anyway, so it's not hard
to translate a "zone-b" or "m1.large" class into the appropriate
request.
* On Orchestra, the mgmt-classes field can hold arbitrary information of
this nature (which would be defined by the sysadmin ahead of time; juju
just accesses what's available). We can easily specify machines by
mgmt-class (or classes) [2].
* On OpenStack... um, I *hope* there's some way to query for this sort
of thing, but I don't actually know how to do it, or what's available.
Informed opinion, anyone?

Now, classes also intersect uncomfortably with the "how" problem above
-- there are groups of classes which are mutually exclusive, and others
that aren't [3], and I don't think there's any way for us to tell the
difference at juju level [4]. Again, we need thought and care to figure
out how to combine requirements expressed at different levels, but that
isn't what I'm fundamentally concerned with here.

Capturing constraints
=====================

When we come to implement stacks, the choices we made here and now will
rather, er, constrain what we're able to do, and it's important to me
that we don't accidentally damage the utility of our future stacks
implementation.

Clearly, we could restrict ourselves to simple machine parameters only,
but I feel these are tailor-made for screwing things up: this is
because, IMO, people's hardware choices are inevitably strongly
influenced by what they have available. When I pick an m1.medium I'm
*not* picking it for one simple parameter only; I'm picking it because
it's the best *of the available options*, and there's no guarantee that
a machine tailor-made for my use case would precisely match -- or even
equal -- an m1.medium's parameters.

That is to say: it's reasonable for us to translate from concrete
requirements into provider-specific resources, but not the other way
round; put alternatively, if we attempt to determine what people want
merely by inspecting what they happen to have already, we're unlikely to
get it right.

So... what can we do? While I hate to introduce another new concept, I
think it's justified here: we want to be able to group constraints as
"roles". This has two notable advantages:

* We can simplify command lines -- considering all the other possible
options we already handle, it'll be quite convenient to be able to do
things like:

  juju set-role compute --constraint cores=128,ram=64G
  juju deploy nova-compute --role compute
  juju set-role compute-spike --constraint cores=32
  juju add-unit nova-compute --role compute-spike

...or even:

  juju set-role compute --constraint cores=128,ram=64G
  juju set-role compute-spike --constraint cores=32
  juju deploy nova-compute --role compute
  ...
  juju set nova-compute --role compute-spike
  juju add-unit nova-compute
  juju add-unit nova-compute
  ...
  juju add-unit nova-compute

* More importantly, it gives us a mechanism for capturing the *intent*
of a set of constraint: so, even if we can't turn "rack-c7" into a
provider-independent constraint, we *can* encode the fact that we'd
prefer to deploy haproxy to a well-connected machine by specifying (say)
a "fat-pipe-proxy" role.

When we come to implement stacks, I think this gives us the best of both
worlds: a role is both a place to store what provider-independent
preferences we can, *and* a hook off which we can hang additional
provider-specific information; again, for a first cut at plausible
syntax, consider:

  juju set-role nova-cluster:compute --constraint cores=128,ram=16G
  juju set-role nova-cluster:compute-spike --constraint cores=32
  juju deploy nova-cluster

So... does any of this make sense to anybody? I don't think the "role"
mechanism is an appropriate part of the basic placement story, but I
think it's a natural extension that will become useful when we implement
stacks, and I'd prefer to ensure we don't accidentally implement
something that will make things harder for us when the time comes.

>From my perspective, the really important questions are "what do you
need to express?", and "can you do so with the vocabulary above,
excluding roles, which we definitely won't have time for?".

Cheers
William

----------------------------------

[0] My understanding is that we can depend on getting this information
out of orchestra somehow, even if the mechanism isn't yet defined.

[1] And one day at the stack level, too, which intersects uncomfortably
with the simple preference ranking of [environment < service < unit].

[2] In fact, we could fulfil a *big* sysadmin-perspective requirement
here very easily with special class syntax understood by orchestra: if,
in addition to the mgmt-classes, we make available a set of
pseudo-classes like "name:node01", "name:node02", etc [5], the orchestra
sysadmin has freedom to specify specific machines when he wants to.

And, based on my conversations, sysadmins really *really* want to be
able to specify individual machines; IMO we ignore this requirement at
our peril.

[3] For example, specifying "m1.small" conflicts with "m1.large", but
not with "zone-b".

[4] OK, we could easily encode knowledge about image type vs
availability zone into the EC2 provider, but we have *no* such
guarantees for an orchestra provider.

[5] Hm: we could, I suppose, expose an awful lot of cobbler state by
this mechanism, if we were so inclined. Can anyone chime in with whether
this could be useful to them, or whether just exposing names is all they
can imagine needing?