[Bug 1977933] Re: nova fails to re-create mediated devices after reboot

OpenStack Infra 1977933 at bugs.launchpad.net
Mon Oct 6 15:21:56 UTC 2025


Reviewed:  https://review.opendev.org/c/openstack/charm-nova-compute-nvidia-vgpu/+/962465
Committed: https://opendev.org/openstack/charm-nova-compute-nvidia-vgpu/commit/64c1239481f328e7c09e2dbf3a0cd925d7a5b1c0
Submitter: "Zuul (22348)"
Branch:    stable/zed

commit 64c1239481f328e7c09e2dbf3a0cd925d7a5b1c0
Author: Edward Hope-Morley <edward.hope-morley at canonical.com>
Date:   Thu Jul 10 18:05:30 2025 +0100

    Add Nova mdev initialisation workaround
    
    Earlier versions of Nova did not support re-initialisation
    of domain GPU mdevs such that a node reboot rendered vms
    unable to boot since their local uuid mismatched with
    the new host uuids. This patch adds a workaround that
    installs a systemd service to initialise all used mdevs to
    match the host and update the Placement API resource
    providers to match these allocations.
    
    Closes-Bug: #1977933
    Change-Id: I902c18895679737c4a9dc20b98affdc98af33659
    Signed-off-by: Edward Hope-Morley <edward.hope-morley at canonical.com>
    (cherry picked from commit c87c8bf7b850973e8d59ff042723ef314949cb23)


** Changed in: charm-nova-compute-nvidia-vgpu/zed
       Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1977933

Title:
  nova fails to re-create mediated devices after reboot

Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm:
  Fix Committed
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm 2023.1 series:
  Fix Committed
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm 2023.2 series:
  Fix Released
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm 2024.1 series:
  Fix Released
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm yoga series:
  In Progress
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm zed series:
  Fix Committed
Status in nova package in Ubuntu:
  Confirmed

Bug description:
  OpenStack Xena
  Ubuntu 20.04

  After a reboot of a nova-compute node with running instances with
  attached vgpu devices the nova-compute daemon fails to startup due to
  missing mediated device definitions.

  It looks like the code intends to detect the missing devices and then
  re-create them but the libvirt python module throws an exception due
  to the missing mediated device when the domain definition is being
  inspected.

  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service [-] Error starting thread.: libvirt.libvirtError: Node device not found: no node device with matching name 'mdev_9a95927e_f50a_4e34_84fc_3b27508f4241'
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service Traceback (most recent call last):
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 806, in run_service
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     service.start()
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/nova/service.py", line 159, in start
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     self.manager.init_host()
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1416, in init_host
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     self.driver.init_host(host=self.host)
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 800, in init_host
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     self._recreate_assigned_mediated_devices()
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 980, in _recreate_assigned_mediated_devices
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     dev_info = self._get_mediated_device_information(dev_name)
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7761, in _get_mediated_device_information
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     virtdev = self._host.device_lookup_by_name(devname)
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/host.py", line 1216, in device_lookup_by_name
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     return self.get_connection().nodeDeviceLookupByName(name)
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 193, in doit
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     result = proxy_call(self._autowrap, f, *args, **kwargs)
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     rv = execute(f, *args, **kwargs)
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     six.reraise(c, e, tb)
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     raise value
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     rv = meth(*args, **kwargs)
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service   File "/usr/lib/python3/dist-packages/libvirt.py", line 4612, in nodeDeviceLookupByName
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service     if ret is None:raise libvirtError('virNodeDeviceLookupByName() failed', conn=self)
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service libvirt.libvirtError: Node device not found: no node device with matching name 'mdev_9a95927e_f50a_4e34_84fc_3b27508f4241'
  2022-06-08 07:24:27.061 2689 ERROR oslo_service.service

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute-nvidia-vgpu/+bug/1977933/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list