[Bug 829412] Re: Integrated DRBD support in Ensemble
Clint Byrum
clint at ubuntu.com
Fri Sep 30 18:27:17 UTC 2011
Moving discussion from bug to mailing list, as this deserves
a wider audience than just the bug subscribers. See bug
https://bugs.launchpad.net/bugs/829412 for Adam's comments.
Excerpts from Andres Rodriguez's message of Fri Sep 30 16:14:32 UTC 2011:
> I agree with Adam here.
>
> Deploying HA cluster that involves DRBD, Pacemaker, and a service on top
> of them with JuJu is not something that we cannot easily achieve with
> formulas. Deploying an HA cluster with JuJu will indeed be a complicated
> task on which I personally wouldn't trust as it involves lots of
> verification, and from my point of view, involves lots of manual
> verification before being sure that the deployed services are actually
> in HA. I also agree with Adam that deploying HA Services with JuJu
> should be left to an external resource.
>
Using charms to deploy this doesn't preclude manual verification. In fact
what it does is allow a user to know the exact steps to verify that this
particular deployment is HA, since it will be exactly like the one that
the charm authors created.
> What I do believe though, is that JuJu should have the knowledge about
> nodes that are being deployed in HA, by, for example, providing the
> knowledge that two deployed machines are related to each other in a HA
> cluster, and not just a simple unit.
>
Thats what peer relations do.
A service which sits on top of DRBD can define DRBD peer relationships
and then all units will be aware of eachother. add-unit isn't just
about scaling for load balancing, its also about availability.
> For example, let's consider we wanted to deploy DRBD (master/slave) with
> JuJu. Then, once 1 machine is deployed we will need a second machine
> that will act as a backup node. This means that both DRBD servers will
> have to communicate with each other, have partitions or extra disks
> configured, and then we have to connect the resources over the network,
> do the initial synchronization and verification. I believe that this
> process requires manual intervention. However I do also believe that
> JuJu could make live easier by preparing everything to let the
> administrator configure the resources within DRBD.
>
One can also simply add drbd capabilities to the mysql charm. A config
setting can define whether or not you want to make use of them. The
charm can delay creating mysql databases until the DRBD relationship
is set up and the DRBD volume is mounted at /var/lib/mysql. If there's
a need to verify the DRBD before the service is used (wise choice)
then simply leave the service un-related until you've done the proper
testing. I do understand that juju agents can't handle reboots yet,
I opened bug #863526 to remind us of this pressing issue (it may be a
dupe of some other issues, not sure).
I also think this type of thing will make sense for a lot of different
charms, which is why I'd like to see bug #806241 worked on soon. However,
most of what is needed is in the DRBD packages, and the rest can be
packaged up, added to the distro or a PPA, and used inside the charms
that need it. Call it charm-helper-drbd, and just install it when you
need to add drbd capabilities to a charm. This isn't insurmountable by
design.
> Now, once we have DRBD up and running, and let's say we would like to
> deploy a MySQL server on top of it, we would need to be able to install
> that service on those two machines, and then put the databases in the
> DRBD resources that are being replicated, then make sure that the slave
> has the same information as the primary and that MySQL in the slave can
> acccess that information.
>
> Furthermore, once you do that, you need to add pacemaker to the picture
> and configure pacemaker in such a way that it knows about the resource
> to manage (DRBD, MySQL), and configure the constraints to bring up
> resources in certain order, create VIP's, and define the constraints for
> failover purposes.
Again, not precluded by juju at all. Pacemaker would also be in the charm
and simply know about all the units in the service. It does its own thing
to inform related machines about who is the leader.. Juju doesn't have
to do anything differently to support this.
>
> All of these, from my point of view, require manual intervention as
> these are intended for mission critical applications that need to be
> throughoutly tested and verified before putting them in production. And
> we also need to consider various other variables such as network
> configuration, power management and fencing, etc. However, I do believe
> that JuJu should be able to prepare the environment by knowing which
> machines are in a HA mode (Master/Slave) in this case, and leave the
> administrator to manually do the final configuration of services.
>
Juju is intended to be mission critical as well! It may not be ready to
do it today, but defining those things is important and a huge target for
the near term development of Juju. Note that I've started tagging bugs
that I think are critical for Juju's use in production as "production"
https://bugs.launchpad.net/juju/+bugs?field.tag=production
I'd encourage everyone to review that list and add/remove/comment so we
know whats necessary to get juju into a state where we can confidently
recommend that people build their business on top of it. I've been careful
to only add things to it that would be hard objections to deploying
a something in production using juju, not things that would just be
"nice to have".
More information about the Juju
mailing list