[Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

Andres Rodriguez andreserl at ubuntu-pe.org
Tue Feb 6 17:24:27 UTC 2018


The deadlock is not expected behavior.

Due to the isolation level, the number of workers (e.g. 12
workers/3regions) and the fact that there could be IO starvation, its
surfacing this issue. That said, changes to improve this and prevent the
deadlocks are not backportable to 2.3 and are targetted for 2.4

On Tue, Feb 6, 2018 at 11:43 AM, Jason Hobbs <jason.hobbs at canonical.com>
wrote:

> Can you please comment on the deadlock detected error from the db log in
> posted in #36
>
> http://paste.ubuntu.com/26530761/
>
> That is not expected behavior is it?  Also the fact that MAAS thinks its
> losing rack/region connections seems like it could be related to this
> behavior.
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1743249
>
> Title:
>   Failed Deployment after timeout trying to retrieve grub cfg
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=maas; milestone=2.4.x; status=New;
> importance=Undecided; assignee=None;
> Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main;
> status=In Progress; importance=Medium; assignee=mathieu.tl at gmail.com;
> Launchpad-Bug-Tags: cdo-qa cdo-qa-blocker foundations-engine patch
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: andreserl blake-rouse cgregan jason-hobbs
> mpontillo vorlon
> Launchpad-Bug-Reporter: Jason Hobbs (jason-hobbs)
> Launchpad-Bug-Modifier: Jason Hobbs (jason-hobbs)
> Launchpad-Message-Rationale: Subscriber (MAAS)
> Launchpad-Message-For: andreserl
>


-- 
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/1743249

Title:
  Failed Deployment after timeout trying to retrieve grub cfg

Status in MAAS:
  New
Status in grub2 package in Ubuntu:
  In Progress

Bug description:
  A node failed to deploy after it failed to retrieve a grub.cfg from
  MAAS due to a timeout.  In the logs, it's clear that the server tried
  to retrieve the grub cfg many times, over about 30 seconds:

  http://paste.ubuntu.com/26387256/

  We see the same thing for other hosts around the same time:

  http://paste.ubuntu.com/26387262/

  It seems like MAAS is taking way too long to respond to these
  requests.

  This is very similar to bug 1724677, which was happening pre-
  metldown/spectre. The only difference is we don't see "[critical] TFTP
  back-end failed" in the logs anymore.

  I connected to the console on this system and it had errors about
  timing out retrieving the grub-cfg, then it had an error message along
  the lines of "error not an ip" and then "double free".  After I
  connected but before I could get a screenshot the system rebooted and
  was directed by maas to power off, which it did successfully after
  booting to linux.

  Full logs are available here:
  https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
  ed277a020e7c/cpe_cloud_395/infra-logs.tar

  This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions



More information about the foundations-bugs mailing list