High Availability command line interface - future plans.

Wed Nov 6 22:37:19 UTC 2013

On Thu, Nov 7, 2013 at 5:54 AM, Tim Penhey <tim.penhey at canonical.com> wrote:

> On 07/11/13 09:11, David Cheney wrote:
> > +1 (million), this solution keeps coming up, and I still feel it is
> > the right one.
> >
> > On Thu, Nov 7, 2013 at 7:07 AM, Kapil Thangavelu
> > <kapil.thangavelu at canonical.com> wrote:
> >>
> >> instead of adding more complexity and concepts, it would be ideal if we
> >> could reuse the primitives we already have. ie juju environments have
> three
> >> user exposed services, that users can add-unit / remove-unit etc.  they
> have
> >> a juju prefix and therefore are omitted by default from status listing.
> >> That's a much simpler story to document. how do i scale my state
> server..
> >> juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.
> >>
> >> -k
>
> NOTE: removed juju at lists.ubuntu.com from the recipients;
>   PLEASE DON'T CROSS-POST
> Seriously!
>
>
> For future direction I agree with this.  We talked about the idea behind
> having the core parts of juju exposed as special services with units.
> We talked about using namespaces.
>

yes hierarchical namespaces would be ideal to convey this separation of
services, as a short term aid to proper namespace support a special cased
'juju' prefix as a namespace would suffice.

> I recall that Gustavo's point at the time is that we don't *need* this
> to get HA, and that we can get HA much simpler to start with.
>

simpler to implement perhaps, but at what complexity to end users.

i want to distinguish i don't care i the implementation if the
implementation is job based internally. What i care about is orthogonality
of interface for end users.

>
> I fully support an approach where we have a simple command to get us
> over the initial hump of managing support.
>
>   juju ensure-ha  (note: not ensure-ha-state)
>

>
> This brings up multiple manager nodes.
>

How many nodes? how do i know which machines are manager?  how does a user
see if there's an internal error on one of these? how do they resolve
errors on them?

we have known solutions for all of these things in juju, lets not invent a
parallel syntax or even worse assume it always just works in a blackbox.
Roger's proposal tries to address some of these but at the cost of a
parallel syntax via add-machine/remove-machine/status and a new concept of
end user job management (although internally job management would be useful
for internal schema upgrades). Trying to isolate it soley to ensure-ha
obscures visibility and behavior imo.

> I like the idea that we treat manager nodes as special, and that
> destroy-machine on them doesn't work the same way.
>
> Consider this:
>
>   juju boostrap
>   juju ensure-ha

> later machine-2 (a manager node goes down)

>
>
  juju ensure-ha
>
> removes machine-2, and brings up machine-x to take it's place.  I was
> talking with William, and I think we both agreed that we don't want to
> restart manager nodes by magic, but wait for user intervention.
>
>
you want to avoid magic, but removing and adding machines with special
behavior isn't magic? remove-machine wouldn't work on a manager machines,
status needs to grow behavior (else how do i even know i need to run
ensure-ha again).

What if there was other placed workloads on those machines?

> Now, looking to the future:
>
> We would have services like:
>   juju:db
>   juju:api
>   juju:something-else (for the other manager worker tasks)
>
> bootstrap would then give machine-0 with a unit of each of these.
>

sounds good.

> ensure-ha would bring up new machines with units of each of these.
>
> A user could add two more api servers by going:
>
>   juju add-unit juju:api -n 2
>
>
There's some notion of a unit step count missing for the db service, ie.
you ideally always have an odd count for the quorum else leader election
votes need an arbiter/weighting.

> I think this gives us a clean, and understandable way of doing things,
> but we SHOULD NOT do this first.
>
>
I'm not convinced.. but implementor's choice. otoh, If we're doing stop
gaps to expose internal mechanisms, then perhaps we should distribute them
as plugins.

cheers,

Kapil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20131107/871f59cf/attachment.html>