Newbie Question: How do I replace a machine in a deployed Model?

Tue Sep 5 17:02:29 UTC 2017

Hi Rick,

Thank you very much for your detailed response.  It makes sense that Kubernetes Core is not designed for HA and there could be data loss in case of a machine failure. My question was more generic in the sense, How do I build self healing to the system where I can replace broken machines. In the scenario that you describe below, will the remaining workers be correctly associated with the new master and etcd? Will it correctly trigger the charm logic on worker machine to point it to the new etcd and master?

Thanks,

--Raghu

From: Rick Harding <rick.harding at canonical.com>
Date: Tuesday, September 5, 2017 at 7:42 AM
To: Raghurama Bhat <rbhat at proofpoint.com>, "juju at lists.ubuntu.com" <juju at lists.ubuntu.com>
Subject: Re: Newbie Question: How do I replace a machine in a deployed Model?

On Thu, Aug 31, 2017 at 11:07 AM Raghurama Bhat <rbhat at proofpoint.com<mailto:rbhat at proofpoint.com>> wrote:
Hi,

I have a newbie question. I deployed a two node Kubernetes Core Cluster using Juju  into a MaaS Setup.   Now if I one of the Machine has a hardware failure, What is the process for replacing it with another machine? Does Juju controller monitor the cluster and request MaaS for a new machine if it detects one of the machines is gone? Even if this has to be done manually, I did not see a replace-machine option to Juju. Only add and remove units and machines. How does this work?

 Juju does not automatically do anything here. The best thing is to have proper monitoring on your machines and in case of a failure such as this you can update things in a number of ways. If you want to replace exactly what's on the failed machine in this case it depends on if it was the first or second machine the bundle uses.

If it was the second one, it looks like that's only the kubernetes-worker node and so you can create a new one with:

    juju add-unit kubernetes-worker

The constraints from the bundle should still be in place, the config will be the same, etc.

If it's the first machine that went down. That's trickier because it has some colocated applications on it. So you'd want to add-unit to something that gets a newly allocated machine:

    juju add-unit kubernetes-master

And then put back the other services using placement directives [1]

    juju add-unit etcd --to=3 # this assumes that the newly created machine for the kubernetes-master is #3
    juju add-unit easyrsa --to lxd:3

One thing to note is that since you're using the non-HA production bundle that there's no fail over of the actual data running in the applications. Your Kubernetes master will get any charm config that's been set before, but it won't have any work you did directly against the Kubernetes cluster.

Typically, we'd suggest that you have things in an HA setup and with monitoring such that you could then detect a failure, and respond to that how you wish. You might add-units into containers on existing machines, you might allocate new machines, or some other response to the failure.

Rick

1: https://jujucharms.com/docs/2.2/charms-deploying#deploying-to-specific-machines-and-containers<https://urldefense.proofpoint.com/v2/url?u=https-3A__jujucharms.com_docs_2.2_charms-2Ddeploying-23deploying-2Dto-2Dspecific-2Dmachines-2Dand-2Dcontainers&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=x8FdL8PaAvrqlNVXQUE48Jwa3PBLaQU6Sdd0A59xnUc&s=PeDerFatqkK1a1_BJlG3balNoz0AVl67tCzx7YTxSFc&e=>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20170905/446174ba/attachment.html>