Re: Multi install with existing MAAS starts all services except for “IP Pending” on Glance Simplestreams Image Sync

Mon Aug 3 13:09:38 UTC 2015

Thanks Jeff,

We'll look into the series options to make sure the charms are deploying on
supported series.

On Fri, Jul 31, 2015 at 3:19 PM, Jeff McLamb <mclamb at gmail.com> wrote:

> So I failed at English there a bit… for the sake of the list if others
> are trying to parse it:
>
> The juju deployment host must be *trusty*, otherwise juju will attempt
> to deploy the glance sync charm based on vivid, which wreaks havoc,
> perhaps due to a systemd issue.
>
> Jeff
>
> On Fri, Jul 31, 2015 at 3:15 PM, Jeff McLamb <mclamb at gmail.com> wrote:
> > Success!
> >
> > Sorry it took a while to get back, but just wanted to follow up and
> > say I finally have a start-to-finish working Multi install! The TL;DR
> > of it is that I need to have a trust-based juju deployment host
> > running the latest openstack-installer from the experimental ppa. The
> > trust requirement is due to the attempts to install the glance sync
> > charm with from vivid if on vivid, and the experimental ppa is
> > required because the stable branch does not seem to honor http-proxy
> > and https-proxy command line arguments. It is also necessary that the
> > juju deployment host be on the UTC timezone in order to match the
> > machines deployed by juju/MAAS.
> >
> > The last few failed iterations were due to a physical machine failing
> > to deploy, say the compute node. This issue was on my end as sometimes
> > my physical servers do not boot without manual interaction due to some
> > bizarre PSU voltage too low warning. If the compute node does not come
> > up within a reasonable amount of time, it seems some of the scripts
> > get run improperly, hence the issues with keystone users, roles, etc.
> > not being available.
> >
> > This last time I manually made sure all servers came up and intervened
> > if they tried to block.
> >
> > The end result is I now have a seemingly working copy of Juno (I used
> > —openstack-release juno) and all interactions on the horizon dashboard
> > seem to be good! I will keep messing around with it and try to deploy
> > some VMs when I get a chance. I will also likely try a re-deploy of
> > Kilo and see how that works.
> >
> > Thanks so much for the help, Mike and Adam, I really appreciate it.
> > Hopefully we can feed some of this stuff back into the process to make
> > it easier. I’ll follow up with more… I think the main page on
> > ubuntu.com for deploying the Canonical Distribution needs some
> > updating. For example, it fails to say you must install juju before
> > openstack ;)
> >
> > Thank you than you!
> >
> > Jeff
> >
> >
> > On Thu, Jul 30, 2015 at 5:01 PM, Mike McCracken
> > <mike.mccracken at canonical.com> wrote:
> >> It definitely looks like the initial failed compute node deployment
> caused
> >> some problems.
> >>
> >> It looks like the script was being run repeatedly and failing on the
> >> following command:
> >>
> >> keystone user-role-add --user ubuntu --role Member --tenant ubuntu
> >>
> >>>> No role with a name or ID of 'Member' exists.
> >>
> >> which is the same thing that happened when you tried it again just now.
> >>
> >> Then you apparently killed the install and tried again, at which point
> the
> >> log is flooded with errors relating to it not finding the machine ID
> that it
> >> recorded in the placement. It's pretty clear that it doesn't deal well
> with
> >> machines where you placed a service leaving MAAS afterward.
> >>
> >> The setup script doesn't run again because after restarting, the
> >> nova-cloud-controller service is marked as having been deployed, even
> though
> >> the script never actually completed successfully.
> >>
> >> Off the top of my head I don't know what might be going on with
> keystone, I
> >> thought the Member role was created by default.
> >> Maybe the keystone unit's debug log has a clue, but at this point I'd be
> >> tempted to just try again and avoid the broken machine.
> >>
> >> I'm sorry this has been such an ordeal, thanks for testing things out!
> >> -mike
> >>
> >> On Thu, Jul 30, 2015 at 12:02 PM, Jeff McLamb <mclamb at gmail.com> wrote:
> >>>
> >>> Here is commands.log, which definitely has complaints about
> >>> nova-controller-setup.sh:
> >>>
> >>> http://paste.ubuntu.com/11968600/
> >>>
> >>> And after running nova-controller-setup.sh again via juju as you
> >>> mentioned:
> >>>
> >>> http://paste.ubuntu.com/11968544/
> >>>
> >>>
> >>> So I guess because the compute node failed to deploy in the first
> >>> place, the installer still tried to issue the nova-controller-setup.sh
> >>> script but it failed without a compute node? Or is that not involved
> >>> in that process? And then, when I re-commissioned and deployed the
> >>> compute node, it failed to re-run the script?
> >>>
> >>> Thanks,
> >>>
> >>> Jeff
> >>>
> >>>
> >>> On Thu, Jul 30, 2015 at 1:41 PM, Mike McCracken
> >>> <mike.mccracken at canonical.com> wrote:
> >>> > Hi Jeff, the ubuntu user and roles etc are created by a script that
> the
> >>> > installer runs after deploying nova-cloud-controller.
> >>> > The file ~/.cloud-install/commands.log will have any errors
> encountered
> >>> > while trying to run that script.
> >>> > You can also look at the script that would run in
> >>> > ~/.cloud-install/nova-controller-setup.sh, and optionally try
> running it
> >>> > yourself - it should be present on the nova-cloud-controller unit in
> >>> > /tmp so
> >>> > you can do e.g.
> >>> > % juju run --unit nova-cloud-controller/0
> "/tmp/nova-controller-setup.sh
> >>> > <the password you used in the installer> Single"
> >>> > to try it again.
> >>> >
> >>> > On Thu, Jul 30, 2015 at 10:14 AM, Jeff McLamb <mclamb at gmail.com>
> wrote:
> >>> >>
> >>> >> So it was easy enough to pick up from the single failed node. After
> >>> >> Deleting it, re-enlisting, commissioning, etc. I was presented with
> a
> >>> >> Ready node with a new name, etc.
> >>> >>
> >>> >> I went into openstack-status and simply added the Compute service
> that
> >>> >> was missing and deployed it to this new node. After a while it was
> up,
> >>> >> all services looked good.
> >>> >>
> >>> >> I issued a `juju machine remove 1` to remove the pending failed
> >>> >> machine from juju that was no longer in the MAAS database — it had
> >>> >> nothing running on it  obviously, so I figured it would be best to
> >>> >> remove it from juju. The new machine is machine 4.
> >>> >>
> >>> >> Now when I try to login to horizon, I get "An error occurred
> >>> >> authenticating. Please try again later.”
> >>> >>
> >>> >> The keystone logs suggest user ubuntu and various roles and projects
> >>> >> were not created, even though openstack-installer tells me to login
> to
> >>> >> horizon with user ubuntu and the password I gave it.
> >>> >>
> >>> >> Here are the keystone logs:
> >>> >>
> >>> >> http://paste.ubuntu.com/11967913/
> >>> >>
> >>> >> Here are the apache error logs on the openstack-dashboard container:
> >>> >>
> >>> >> http://paste.ubuntu.com/11967922/
> >>> >>
> >>> >> Any ideas here?
> >>> >>
> >>> >>
> >>> >> On Thu, Jul 30, 2015 at 12:34 PM, Jeff McLamb <mclamb at gmail.com>
> wrote:
> >>> >> > Just to give you an update where I am:
> >>> >> >
> >>> >> > I tried various forms still using the underlying vivid MAAS/juju
> >>> >> > deployment host, tried --edit-placement, which erred out, tried
> >>> >> > removing Glance Sync again, etc. all to no avail.
> >>> >> >
> >>> >> > Then I created a trusty VM on the MAAS host and installed the
> stable
> >>> >> > juju and cloud-install ppa's. The problem with the stable version
> of
> >>> >> > openstack-install is that it does not honor the http_proxy and
> >>> >> > https_proxy lines passed on the command-line. I can see that they
> do
> >>> >> > not get put into the environments.yaml file, so I ended up with
> the
> >>> >> > same issue there as I had originally, where it could not download
> the
> >>> >> > tools.
> >>> >> >
> >>> >> > So I updated the cloud-install to the experimental on the trusty
> juju
> >>> >> > deployment VM and used the latest version, which worked fine with
> >>> >> > http_proxy and https_proxy. I have played around with trying to
> >>> >> > deploy
> >>> >> > both juno and kilo as well.
> >>> >> >
> >>> >> > My latest attempt on trusty, deploying juno, has left one physical
> >>> >> > node in a Failed Deployment state, which seems to have been caused
> >>> >> > because it keeps saying the BMC is busy, so it can't control
> power. I
> >>> >> > tried releasing it, which failed, so I ultimately had to Delete it
> >>> >> > and
> >>> >> > re-enlist, re-commission.
> >>> >> >
> >>> >> > Now I am at a point where the machine is back to Ready and the
> >>> >> > openstack-install is still waiting on 1 last machine (the other 2
> >>> >> > deployed just fine)... When something like this happens, is it
> >>> >> > possible to re-deploy the last remaining host, or must I start
> over
> >>> >> > deploying all machines again?
> >>> >> >
> >>> >> > Thanks,
> >>> >> >
> >>> >> > Jeff
> >>> >> >
> >>> >> >
> >>> >> > On Thu, Jul 30, 2015 at 1:21 AM, Mike McCracken
> >>> >> > <mike.mccracken at canonical.com> wrote:
> >>> >> >>
> >>> >> >>
> >>> >> >> On Wed, Jul 29, 2015 at 5:30 PM, Jeff McLamb <mclamb at gmail.com>
> >>> >> >> wrote:
> >>> >> >>>
> >>> >> >>> OK a quick look at the neutron-api/0 /var/log/neutron just shows
> >>> >> >>> the
> >>> >> >>> neutron-server.log as before… but since I stepped away in the
> past
> >>> >> >>> hour it’s now at 800MB and counting! ;)
> >>> >> >>>
> >>> >> >>> I will play around with the relations a bit just to learn what’s
> >>> >> >>> going
> >>> >> >>> on, but then I will take your advice and try various
> alternatives
> >>> >> >>> with
> >>> >> >>> —edit-placement first, then finally just changing the underlying
> >>> >> >>> MAAS
> >>> >> >>> deployment server to trusty and see where it takes me.
> >>> >> >>
> >>> >> >>
> >>> >> >> Sounds good
> >>> >> >>
> >>> >> >>>
> >>> >> >>> Could also try
> >>> >> >>> to install without —upstream-ppa which I imagine will install
> juno
> >>> >> >>> instead of kilo?
> >>> >> >>
> >>> >> >>
> >>> >> >> oh, --upstream-ppa doesn't do anything for the MAAS install path,
> >>> >> >> it's
> >>> >> >> only
> >>> >> >> applicable to the containerized single install.
> >>> >> >> It's harmless, though. On the single install, it's used to
> specify
> >>> >> >> that
> >>> >> >> version of the "openstack" package (which contains
> >>> >> >> openstack-install)
> >>> >> >> that
> >>> >> >> will be installed on the container to run the second half of the
> >>> >> >> process
> >>> >> >> should come from our experimental PPA. It could use some better
> >>> >> >> docs/usage
> >>> >> >> string.
> >>> >> >>
> >>> >> >> If you're interested in trying out other openstack release
> versions,
> >>> >> >> you
> >>> >> >> want to look at --openstack-release.
> >>> >> >>
> >>> >> >> -mike
> >>> >> >>
> >>> >> >>>
> >>> >> >>> Will keep you posted and continued thanks for all the help.
> >>> >> >>>
> >>> >> >>> Jeff
> >>> >> >>>
> >>> >> >>> On Wed, Jul 29, 2015 at 7:08 PM, Mike McCracken
> >>> >> >>> <mike.mccracken at canonical.com> wrote:
> >>> >> >>> > Jeff, based on the other logs you sent me, e.g.
> >>> >> >>> > neutron-metadata-agent.log,
> >>> >> >>> > it was pointed out to me that it's trying to connect to
> rabbitMQ
> >>> >> >>> > on
> >>> >> >>> > localhost, which is wrong.
> >>> >> >>> > So something is failing to complete the juju relations.
> >>> >> >>> > My hypothesis is that the failing vivid-series charm is
> messing
> >>> >> >>> > up
> >>> >> >>> > juju's
> >>> >> >>> > relations.
> >>> >> >>> > If you want to dig further, you can start looking at the
> >>> >> >>> > relations
> >>> >> >>> > using
> >>> >> >>> > e.g. 'juju run --unit 'relation-get amqp:rabbitmq' ' (might
> just
> >>> >> >>> > be
> >>> >> >>> > 'amqp')
> >>> >> >>> >
> >>> >> >>> > Or if you'd like to try just redeploying without the sync
> charm
> >>> >> >>> > using
> >>> >> >>> > --edit-placement, that might get a healthy cluster going, just
> >>> >> >>> > one
> >>> >> >>> > without
> >>> >> >>> > glance images.
> >>> >> >>> > Then you could pretty easily deploy the charm manually, or
> just
> >>> >> >>> > do
> >>> >> >>> > without
> >>> >> >>> > it and upload images you get from cloud-images.ubuntu.com
> >>> >> >>> > manually .
> >>> >> >>> >
> >>> >> >>> > Sorry this is not as simple as it should be, yet :)
> >>> >> >>> > -mike
> >>> >> >>> >
> >>> >> >>> > On Wed, Jul 29, 2015 at 4:00 PM, Mike McCracken
> >>> >> >>> > <mike.mccracken at canonical.com> wrote:
> >>> >> >>> >>
> >>> >> >>> >> ok, so I just learned that the neutron-manage log should be
> in
> >>> >> >>> >> the
> >>> >> >>> >> neutron-api unit, so can you 'juju ssh neutron-api/0' and
> look
> >>> >> >>> >> in
> >>> >> >>> >> /var/log/neutron there?
> >>> >> >>> >>
> >>> >> >>> >> On Wed, Jul 29, 2015 at 3:34 PM, Jeff McLamb <
> mclamb at gmail.com>
> >>> >> >>> >> wrote:
> >>> >> >>> >>>
> >>> >> >>> >>> The neutron-server.log that is 500MB+ and growing is nonstop
> >>> >> >>> >>> repeated
> >>> >> >>> >>> output of the following, due to a database table that does
> not
> >>> >> >>> >>> exist:
> >>> >> >>> >>>
> >>> >> >>> >>> http://paste.ubuntu.com/11962679/
> >>> >> >>> >>>
> >>> >> >>> >>> On Wed, Jul 29, 2015 at 6:30 PM, Jeff McLamb <
> mclamb at gmail.com>
> >>> >> >>> >>> wrote:
> >>> >> >>> >>> > Hey Mike -
> >>> >> >>> >>> >
> >>> >> >>> >>> > OK so here is the juju status output. The quantum-gateway
> >>> >> >>> >>> > doesn’t
> >>> >> >>> >>> > look
> >>> >> >>> >>> > too strange, but I am new. The exposed status is false,
> but
> >>> >> >>> >>> > so
> >>> >> >>> >>> > it is
> >>> >> >>> >>> > for all services, and I can definitely access, say, the
> >>> >> >>> >>> > dashboard,
> >>> >> >>> >>> > even though it is not “exposed”. One thing of note is the
> >>> >> >>> >>> > public-address lines that sometimes use the domain names,
> >>> >> >>> >>> > e.g.
> >>> >> >>> >>> > downright-feet.maas in this case, whereas some services
> use
> >>> >> >>> >>> > IP
> >>> >> >>> >>> > addresses. I have noticed that I cannot resolve the maas
> >>> >> >>> >>> > names
> >>> >> >>> >>> > from
> >>> >> >>> >>> > the MAAS server (because I use the ISP’s DNS servers) but
> I
> >>> >> >>> >>> > can
> >>> >> >>> >>> > resolve them from the deployed nodes.  Here is the output:
> >>> >> >>> >>> >
> >>> >> >>> >>> > http://paste.ubuntu.com/11962631/
> >>> >> >>> >>> >
> >>> >> >>> >>> > Here is the quantum gateway replay:
> >>> >> >>> >>> >
> >>> >> >>> >>> > http://paste.ubuntu.com/11962644/
> >>> >> >>> >>> >
> >>> >> >>> >>> > Where are the neutron-manage logs? I see lots of neutron
> >>> >> >>> >>> > stuff
> >>> >> >>> >>> > on
> >>> >> >>> >>> > various containers and nodes — the neutron-server.log is
> what
> >>> >> >>> >>> > I
> >>> >> >>> >>> > pasted
> >>> >> >>> >>> > before and it is 500+MB and growing across a few nodes,
> but I
> >>> >> >>> >>> > can’t
> >>> >> >>> >>> > seem to fine neutron-manage.
> >>> >> >>> >>> >
> >>> >> >>> >>> > Thanks!
> >>> >> >>> >>> >
> >>> >> >>> >>> > Jeff
> >>> >> >>> >>> >
> >>> >> >>> >>> >
> >>> >> >>> >>> > On Wed, Jul 29, 2015 at 5:26 PM, Mike McCracken
> >>> >> >>> >>> > <mike.mccracken at canonical.com> wrote:
> >>> >> >>> >>> >> Hi Jeff, I asked internally and was asked if you could
> share
> >>> >> >>> >>> >> the
> >>> >> >>> >>> >> juju
> >>> >> >>> >>> >> charm
> >>> >> >>> >>> >> logs from quantum-gateway and the neutron-manage logs in
> >>> >> >>> >>> >> /var/log/neutron.
> >>> >> >>> >>> >>
> >>> >> >>> >>> >> the charm log can be replayed by using 'juju debug-log -i
> >>> >> >>> >>> >> quantum-gateway/0
> >>> >> >>> >>> >> --replay'
> >>> >> >>> >>> >>
> >>> >> >>> >>> >> On Wed, Jul 29, 2015 at 2:03 PM, Mike McCracken
> >>> >> >>> >>> >> <mike.mccracken at canonical.com> wrote:
> >>> >> >>> >>> >>>
> >>> >> >>> >>> >>> Sorry this is so frustrating.
> >>> >> >>> >>> >>> Can you check 'juju status' for this environment and
> see if
> >>> >> >>> >>> >>> it
> >>> >> >>> >>> >>> says
> >>> >> >>> >>> >>> anything useful about the quantum-gateway service (aka
> >>> >> >>> >>> >>> neutron,
> >>> >> >>> >>> >>> the
> >>> >> >>> >>> >>> juju
> >>> >> >>> >>> >>> service name will be updated soon).
> >>> >> >>> >>> >>>
> >>> >> >>> >>> >>> -mike
> >>> >> >>> >>> >>>
> >>> >> >>> >>> >>> On Wed, Jul 29, 2015 at 1:15 PM, Jeff McLamb
> >>> >> >>> >>> >>> <mclamb at gmail.com>
> >>> >> >>> >>> >>> wrote:
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> OK, making progress now. Per your recommendation I
> removed
> >>> >> >>> >>> >>>> and
> >>> >> >>> >>> >>>> added
> >>> >> >>> >>> >>>> back in the trusty sync charm manually.
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> Now, I can log in to the horizon dashboard!
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> However, several tabs result in a generic OpenStack
> (not
> >>> >> >>> >>> >>>> Ubuntu-customized like the general dashboard pages)
> >>> >> >>> >>> >>>> "Something
> >>> >> >>> >>> >>>> went
> >>> >> >>> >>> >>>> wrong! An unexpected error has occurred. Try refreshing
> >>> >> >>> >>> >>>> the
> >>> >> >>> >>> >>>> page..."
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> The tabs in question that give those results are
> Compute
> >>> >> >>> >>> >>>> ->
> >>> >> >>> >>> >>>> Access &
> >>> >> >>> >>> >>>> Security, Network -> Network Topology,
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> When I go to pages like Network -> Routers, it does
> >>> >> >>> >>> >>>> render,
> >>> >> >>> >>> >>>> but
> >>> >> >>> >>> >>>> there
> >>> >> >>> >>> >>>> are error popup boxes in the page itself with:
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> Error: Unable to retrieve router list.
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> and
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> Error: Unable to retrieve a list of external networks
> >>> >> >>> >>> >>>> "Connection
> >>> >> >>> >>> >>>> to
> >>> >> >>> >>> >>>> neutron failed: HTTPConnectionPool(host='192.168.1.45',
> >>> >> >>> >>> >>>> port=9696):
> >>> >> >>> >>> >>>> Max retries exceeded with url:
> >>> >> >>> >>> >>>> /v2.0/networks.json?router%3Aexternal=True (Caused by
> >>> >> >>> >>> >>>> <class
> >>> >> >>> >>> >>>> 'httplib.BadStatusLine'>: '')”.
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> If I do a `juju ssh openstack-dashboard/0` and tail -f
> >>> >> >>> >>> >>>> /var/log/apache2/error.log I get the following when
> >>> >> >>> >>> >>>> accessing
> >>> >> >>> >>> >>>> one
> >>> >> >>> >>> >>>> of
> >>> >> >>> >>> >>>> the failed pages:
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> http://paste.ubuntu.com/11961863/
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> Furthermore, looking at the neutron server logs, I see
> >>> >> >>> >>> >>>> non-stop
> >>> >> >>> >>> >>>> traces
> >>> >> >>> >>> >>>> about the neutron.ml2_gre_allocations table not
> existing:
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> http://paste.ubuntu.com/11961891/
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> Getting closer, bit by bit.
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> Thanks for all the help,
> >>> >> >>> >>> >>>>
> >>> >> >>> >>> >>>> Jeff
> >>> >> >>> >>> >>>
> >>> >> >>> >>> >>>
> >>> >> >>> >>> >>>
> >>> >> >>> >>> >>
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >
> >>> >> >>
> >>> >> >>
> >>> >
> >>> >
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-openstack-installer/attachments/20150803/99cfa10c/attachment-0001.html>