[Bug 1582278] Re: [SR-IOV][CPU Pinning] nova compute can try to boot VM with CPUs from one NUMA node and PCI device from another NUMA node.
OpenStack Infra
1582278 at bugs.launchpad.net
Tue Oct 4 01:44:03 UTC 2016
Reviewed: https://review.openstack.org/317064
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=257cfb7e6f2f3414640c632909f78db6b71f40b3
Submitter: Jenkins
Branch: stable/mitaka
commit 257cfb7e6f2f3414640c632909f78db6b71f40b3
Author: Jay Pipes <jaypipes at gmail.com>
Date: Fri Apr 1 16:03:47 2016 -0700
pci: pass in instance PCI requests to claim
Removes the calls to InstancePCIRequests.get_XXX() from within the
claims.Claim and claims.MoveClaim constructors and instead has the
resource tracker construct the PCI requests and pass them into the
constructor.
This allows us to remove the needlessly duplicative _test_pci() method
in claims.MoveClaim and will allow the next patch in the series to
remove the call in nova.pci.manager.PciDevTracker.claim_instance() that
re-fetches PCI requests for the supplied instance.
Related-Bug: #1368201
Related-Bug: #1582278
Change-Id: Ib2cc7c985839fbf88b5e6e437c4b395ab484b1b6
(cherry picked from commit 74fbff88639891269f6a0752e70b78340cf87e9a)
** Tags added: in-stable-mitaka
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1582278
Title:
[SR-IOV][CPU Pinning] nova compute can try to boot VM with CPUs from
one NUMA node and PCI device from another NUMA node.
Status in OpenStack Compute (nova):
Fix Released
Status in nova package in Ubuntu:
Fix Released
Status in nova source package in Xenial:
Triaged
Status in nova source package in Yakkety:
Fix Released
Bug description:
Environment:
Two NUMA nodes on compute host (node-0 and node-1).
One SR-IOV PCI device associated with NUMA node-1.
Steps to reproduce:
Steps to reproduce:
1) Deploy env with SR-IOV and CPU pinning enable
2) Create new flavor with cpu pinning:
nova flavor-show m1.small.performance
+----------------------------+-------------------------------------------------------------------------------------------------------+
| Property | Value |
+----------------------------+-------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 20 |
| extra_specs | {"hw:cpu_policy": "dedicated", "hw:numa_nodes": "1"} |
| id | 7b0e5ee0-0bf7-4a46-9653-9279a947c650 |
| name | m1.small.performance |
| os-flavor-access:is_public | True |
| ram | 2048 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+--------------------------------------------------------------------------------
3) download ubuntu image
4) create sr-iov port and boot vm on this port with m1.small.performance flavor:
NODE_1='node-4.test.domain.local'
NODE_2='node-5.test.domain.local'
NET_ID_1=$(neutron net-list | grep net_EW_2 | awk '{print$2}')
neutron port-create $NET_ID_1 --binding:vnic-type direct --device_owner nova-compute --name sriov_23
port_id=$(neutron port-list | grep 'sriov_23' | awk '{print$2}')
nova boot vm23 --flavor m1.small.performance --image ubuntu_image --availability-zone nova:$NODE_1 --nic port-id=$port_id --key-name vm_key
Expected results:
VM is an ACTIVE state
Actual result:
In most cases the state is ERROR with following logs:
2016-05-13 08:25:56.598 29097 ERROR nova.pci.stats [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] Failed to allocate PCI devices for instance. Unassigning devices back to pools. This should not happen, since the scheduler should have accurate information, and allocation during claims is controlled via a hold on the compute node semaphore
2016-05-13 08:25:57.502 29097 INFO nova.virt.libvirt.driver [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] Creating image
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] Instance failed network setup after 1 attempt(s)
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager Traceback (most recent call last):
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1570, in _allocate_network_async
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager bind_host_id=bind_host_id)
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 666, in allocate_for_instance
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager self._delete_ports(neutron, instance, created_port_ids)
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager self.force_reraise()
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager six.reraise(self.type_, self.value, self.tb)
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 645, in allocate_for_instance
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager bind_host_id=bind_host_id)
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 738, in _populate_neutron_extension_values
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager port_req_body)
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 709, in _populate_neutron_binding_profile
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager instance, pci_request_id).pop()
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager IndexError: pop from empty list
2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] Instance failed to spawn
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] Traceback (most recent call last):
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2218, in _build_resources
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] yield resources
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2064, in _build_and_run_instance
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] block_device_info=block_device_info)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2761, in spawn
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] admin_pass=admin_password)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3287, in _create_image
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] instance, network_info, admin_pass, files, suffix)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3066, in _inject_data
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] network_info, libvirt_virt_type=CONF.libvirt.virt_type)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/virt/netutils.py", line 78, in get_injected_network_template
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] if not (network_info and template):
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/network/model.py", line 517, in __len__
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] return self._sync_wrapper(fn, *args, **kwargs)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/network/model.py", line 504, in _sync_wrapper
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] self.wait()
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/network/model.py", line 536, in wait
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] self[:] = self._gt.wait()
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 175, in wait
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] return self._exit_event.wait()
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 125, in wait
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] current.throw(*self._exc)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] result = function(*args, **kwargs)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 1145, in context_wrapper
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] return func(*args, **kwargs)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1587, in _allocate_network_async
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] six.reraise(*exc_info)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1570, in _allocate_network_async
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] bind_host_id=bind_host_id)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 666, in allocate_for_instance
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] self._delete_ports(neutron, instance, created_port_ids)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] self.force_reraise()
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] six.reraise(self.type_, self.value, self.tb)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 645, in allocate_for_instance
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] bind_host_id=bind_host_id)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 738, in _populate_neutron_extension_values
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] port_req_body)
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 709, in _populate_neutron_binding_profile
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] instance, pci_request_id).pop()
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] IndexError: pop from empty list
2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]
2016-05-13 08:25:57.939 29097 INFO nova.compute.manager [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] Terminating instance
The problem is in nova/compute/resource_tracker.py. In method
instance_claim():
claim = claims.Claim(context, instance_ref, self, self.compute_node,
overhead=overhead, limits=limits)
if self.pci_tracker:
self.pci_tracker.claim_instance(context, instance_ref)
instance_ref.numa_topology = claim.claimed_numa_topology
self._set_instance_host_and_node(instance_ref)
1) here nova create a claim with correct NUMA node with CPU pinning and PCI devices (in our case it is node-1)
2) nova call pci_tracker.claim_instance() with instance_ref BUT instance_ref does not contain information about needed NUMA node. That is why in claim_instance we choose node-0. In this case we can't associate requested PCI devices with the instance because these devices are associated with node-1.
3) nova put to the instance_ref correct numa node-1 from step 1.
4) we got an instance without PCI devices.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1582278/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list