[Bug 1978489] Re: libvirt / cgroups v2: cannot boot instance with more than 16 CPUs

Jan Graichen 1978489 at bugs.launchpad.net
Thu Jan 18 13:05:51 UTC 2024


Hello,

We're affected by this bug too. Unfortunately, the patch changes the
behavior for instances by completely removing the default cputune.
Therefore, instance are no longer weighted to each other at all.

We tried adding `quota:cpu_shares` to our flavors (vcpus * 100), but
that isn't applied to any existing instance. They stay unweighted and
are now overloaded by new instances.

As far as we know, updating flavors never was planned to affect existing
instances, even if there are only some extra spec changes, but here, a
bug/change breaks all existing instances, and fixing the flavor doesn't
help at all.

Some other flavors already had `quota:cpu_shares` > 10000. They broke
completely too and cannot be fixed without patching inside the nova
database at around three places.

Is there any workaround to rebuilding hundreds of instances like force
nova to override flavors of existing instances?

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1978489

Title:
  libvirt / cgroups v2: cannot boot instance with more than 16 CPUs

Status in OpenStack Compute (nova):
  In Progress
Status in nova package in Ubuntu:
  Confirmed
Status in nova source package in Jammy:
  Triaged

Bug description:
  Description
  ===========

  Using the libvirt driver and a host OS that uses cgroups v2 (RHEL 9,
  Ubuntu Jammy), an instance with more than 16 CPUs cannot be booted.

  Steps to reproduce
  ==================

  1. Boot an instance with 10 (or more) CPUs on RHEL 9 or Ubuntu Jammy
  using Nova with the libvirt driver.

  Expected result
  ===============

  Instance boots.

  Actual result
  =============

  Instance fails to boot with a 'Value specified in CPUWeight is out of
  range' error.

  Environment
  ===========

  Originially report as a libvirt but in RHEL 9 [1]

  Additional information
  ======================

  This is happening because Nova defaults to 1024 * (# of CPUs) for the
  value of domain/cputune/shares in the libvirt XML. This is then passed
  directly by libvirt to the cgroups API, but cgroups v2 has a maximum
  value of 10000. 10000 / 1024 ~= 9.76

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=2035518

  
  ====================================

  Ubuntu SRU Details:

  [Impact]
  See above.

  [Test Case]
  See above.

  [Regression Potential]
  We've had this change in other jammy-based versions of the nova package for a while now, including zed, antelope, bobcat.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1978489/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list