[Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

Steve Langasek steve.langasek at canonical.com
Thu Feb 1 18:49:37 UTC 2018


On Thu, Feb 01, 2018 at 06:15:31PM -0000, Andres Rodriguez wrote:
> @Jason,

> Packet 90573 doesn't seem to me as an indication of what you are
> describing. What I see is this:

> 1. grub makes ~30 requests for PXE config on grub.cfg-<mac>, after which it gives up because it didn't receive a response.
> 2. grub moves on and requests grub.cfg-default-amd64, and it receives a response from MAAS.

> Now, the difference between the above, is that 1 does *database*
> lookups, while 2 does not. In other words, 1 causes a request to obtain
> the 'node' object based on the MAC to provide, and if grub is making 30+
> requests, then this can definitely flood the db with requests.

Then as I've said on IRC, this is a bug in maas, because 30 udp retries
should not generate 30 requests to the database.

GRUB is *not* wrong to retransmit its udp packets when it doesn't get a
response.  If each of these increases the load in MAAS, then MAAS should be
fixed.

The case where GRUB retrieves the same file multiple times is a GRUB bug,
but I don't see any evidence linking this GRUB bug to the timeout and
fallback problem in Jason's latest trace.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/1743249

Title:
  Failed Deployment after timeout trying to retrieve grub cfg

Status in MAAS:
  Incomplete
Status in grub2 package in Ubuntu:
  New

Bug description:
  A node failed to deploy after it failed to retrieve a grub.cfg from
  MAAS due to a timeout.  In the logs, it's clear that the server tried
  to retrieve the grub cfg many times, over about 30 seconds:

  http://paste.ubuntu.com/26387256/

  We see the same thing for other hosts around the same time:

  http://paste.ubuntu.com/26387262/

  It seems like MAAS is taking way too long to respond to these
  requests.

  This is very similar to bug 1724677, which was happening pre-
  metldown/spectre. The only difference is we don't see "[critical] TFTP
  back-end failed" in the logs anymore.

  I connected to the console on this system and it had errors about
  timing out retrieving the grub-cfg, then it had an error message along
  the lines of "error not an ip" and then "double free".  After I
  connected but before I could get a screenshot the system rebooted and
  was directed by maas to power off, which it did successfully after
  booting to linux.

  Full logs are available here:
  https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
  ed277a020e7c/cpe_cloud_395/infra-logs.tar

  This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions



More information about the foundations-bugs mailing list