[Bug 1977933] Re: nova fails to re-create mediated devices after reboot
OpenStack Infra
1977933 at bugs.launchpad.net
Mon Oct 6 15:21:56 UTC 2025
Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute-nvidia-vgpu/+/962465
Committed: https://opendev.org/openstack/charm-nova-compute-nvidia-vgpu/commit/64c1239481f328e7c09e2dbf3a0cd925d7a5b1c0
Submitter: "Zuul (22348)"
Branch: stable/zed
commit 64c1239481f328e7c09e2dbf3a0cd925d7a5b1c0
Author: Edward Hope-Morley <edward.hope-morley at canonical.com>
Date: Thu Jul 10 18:05:30 2025 +0100
Add Nova mdev initialisation workaround
Earlier versions of Nova did not support re-initialisation
of domain GPU mdevs such that a node reboot rendered vms
unable to boot since their local uuid mismatched with
the new host uuids. This patch adds a workaround that
installs a systemd service to initialise all used mdevs to
match the host and update the Placement API resource
providers to match these allocations.
Closes-Bug: #1977933
Change-Id: I902c18895679737c4a9dc20b98affdc98af33659
Signed-off-by: Edward Hope-Morley <edward.hope-morley at canonical.com>
(cherry picked from commit c87c8bf7b850973e8d59ff042723ef314949cb23)
** Changed in: charm-nova-compute-nvidia-vgpu/zed
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1977933
Title:
nova fails to re-create mediated devices after reboot
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm:
Fix Committed
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm 2023.1 series:
Fix Committed
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm 2023.2 series:
Fix Released
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm 2024.1 series:
Fix Released
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm yoga series:
In Progress
Status in OpenStack Nova Compute NVIDIA vGPU Plugin Charm zed series:
Fix Committed
Status in nova package in Ubuntu:
Confirmed
Bug description:
OpenStack Xena
Ubuntu 20.04
After a reboot of a nova-compute node with running instances with
attached vgpu devices the nova-compute daemon fails to startup due to
missing mediated device definitions.
It looks like the code intends to detect the missing devices and then
re-create them but the libvirt python module throws an exception due
to the missing mediated device when the domain definition is being
inspected.
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service [-] Error starting thread.: libvirt.libvirtError: Node device not found: no node device with matching name 'mdev_9a95927e_f50a_4e34_84fc_3b27508f4241'
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service Traceback (most recent call last):
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 806, in run_service
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service service.start()
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 159, in start
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service self.manager.init_host()
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1416, in init_host
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service self.driver.init_host(host=self.host)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 800, in init_host
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service self._recreate_assigned_mediated_devices()
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 980, in _recreate_assigned_mediated_devices
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service dev_info = self._get_mediated_device_information(dev_name)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7761, in _get_mediated_device_information
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service virtdev = self._host.device_lookup_by_name(devname)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/host.py", line 1216, in device_lookup_by_name
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service return self.get_connection().nodeDeviceLookupByName(name)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 193, in doit
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service result = proxy_call(self._autowrap, f, *args, **kwargs)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service rv = execute(f, *args, **kwargs)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service six.reraise(c, e, tb)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service raise value
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service rv = meth(*args, **kwargs)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/libvirt.py", line 4612, in nodeDeviceLookupByName
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service if ret is None:raise libvirtError('virNodeDeviceLookupByName() failed', conn=self)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service libvirt.libvirtError: Node device not found: no node device with matching name 'mdev_9a95927e_f50a_4e34_84fc_3b27508f4241'
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute-nvidia-vgpu/+bug/1977933/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list