[Bug 1582278] Re: [SR-IOV][CPU Pinning] nova compute can try to boot VM with CPUs from one NUMA node and PCI device from another NUMA node.

OpenStack Infra 1582278 at bugs.launchpad.net
Tue Oct 4 01:44:03 UTC 2016


Reviewed:  https://review.openstack.org/317064
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=257cfb7e6f2f3414640c632909f78db6b71f40b3
Submitter: Jenkins
Branch:    stable/mitaka

commit 257cfb7e6f2f3414640c632909f78db6b71f40b3
Author: Jay Pipes <jaypipes at gmail.com>
Date:   Fri Apr 1 16:03:47 2016 -0700

    pci: pass in instance PCI requests to claim
    
    Removes the calls to InstancePCIRequests.get_XXX() from within the
    claims.Claim and claims.MoveClaim constructors and instead has the
    resource tracker construct the PCI requests and pass them into the
    constructor.
    
    This allows us to remove the needlessly duplicative _test_pci() method
    in claims.MoveClaim and will allow the next patch in the series to
    remove the call in nova.pci.manager.PciDevTracker.claim_instance() that
    re-fetches PCI requests for the supplied instance.
    
    Related-Bug: #1368201
    Related-Bug: #1582278
    
    Change-Id: Ib2cc7c985839fbf88b5e6e437c4b395ab484b1b6
    (cherry picked from commit 74fbff88639891269f6a0752e70b78340cf87e9a)


** Tags added: in-stable-mitaka

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1582278

Title:
  [SR-IOV][CPU Pinning] nova compute can try to boot VM with CPUs from
  one NUMA node and PCI device from another NUMA node.

Status in OpenStack Compute (nova):
  Fix Released
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Xenial:
  Triaged
Status in nova source package in Yakkety:
  Fix Released

Bug description:
  Environment:
  Two NUMA nodes on compute host (node-0 and node-1).
  One SR-IOV PCI device associated with NUMA node-1.

  Steps to reproduce:

  Steps to reproduce:
   1) Deploy env with SR-IOV and CPU pinning enable
   2) Create new flavor with cpu pinning:
  nova flavor-show m1.small.performance
  +----------------------------+-------------------------------------------------------------------------------------------------------+
  | Property | Value |
  +----------------------------+-------------------------------------------------------------------------------------------------------+
  | OS-FLV-DISABLED:disabled | False |
  | OS-FLV-EXT-DATA:ephemeral | 0 |
  | disk | 20 |
  | extra_specs | {"hw:cpu_policy": "dedicated", "hw:numa_nodes": "1"} |
  | id | 7b0e5ee0-0bf7-4a46-9653-9279a947c650 |
  | name | m1.small.performance |
  | os-flavor-access:is_public | True |
  | ram | 2048 |
  | rxtx_factor | 1.0 |
  | swap | |
  | vcpus | 1 |
  +----------------------------+--------------------------------------------------------------------------------
   3) download ubuntu image
   4) create sr-iov port and boot vm on this port with m1.small.performance flavor:
  NODE_1='node-4.test.domain.local'
  NODE_2='node-5.test.domain.local'
  NET_ID_1=$(neutron net-list | grep net_EW_2 | awk '{print$2}')
  neutron port-create $NET_ID_1 --binding:vnic-type direct --device_owner nova-compute --name sriov_23
  port_id=$(neutron port-list | grep 'sriov_23' | awk '{print$2}')
  nova boot vm23 --flavor m1.small.performance --image ubuntu_image --availability-zone nova:$NODE_1 --nic port-id=$port_id --key-name vm_key

  Expected results:
   VM is an ACTIVE state
  Actual result:
   In most cases the state is ERROR with following logs:

  2016-05-13 08:25:56.598 29097 ERROR nova.pci.stats [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] Failed to allocate PCI devices for instance. Unassigning devices back to pools. This should not happen, since the scheduler should have accurate information, and allocation during claims is controlled via a hold on the compute node semaphore
  2016-05-13 08:25:57.502 29097 INFO nova.virt.libvirt.driver [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] Creating image
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] Instance failed network setup after 1 attempt(s)
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager Traceback (most recent call last):
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1570, in _allocate_network_async
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager     bind_host_id=bind_host_id)
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 666, in allocate_for_instance
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager     self._delete_ports(neutron, instance, created_port_ids)
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager     self.force_reraise()
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager     six.reraise(self.type_, self.value, self.tb)
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 645, in allocate_for_instance
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager     bind_host_id=bind_host_id)
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 738, in _populate_neutron_extension_values
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager     port_req_body)
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 709, in _populate_neutron_binding_profile
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager     instance, pci_request_id).pop()
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager IndexError: pop from empty list
  2016-05-13 08:25:57.664 29097 ERROR nova.compute.manager
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] Instance failed to spawn
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] Traceback (most recent call last):
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2218, in _build_resources
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     yield resources
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2064, in _build_and_run_instance
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     block_device_info=block_device_info)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2761, in spawn
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     admin_pass=admin_password)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3287, in _create_image
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     instance, network_info, admin_pass, files, suffix)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3066, in _inject_data
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     network_info, libvirt_virt_type=CONF.libvirt.virt_type)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/virt/netutils.py", line 78, in get_injected_network_template
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     if not (network_info and template):
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/network/model.py", line 517, in __len__
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     return self._sync_wrapper(fn, *args, **kwargs)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/network/model.py", line 504, in _sync_wrapper
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     self.wait()
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/network/model.py", line 536, in wait
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     self[:] = self._gt.wait()
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 175, in wait
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     return self._exit_event.wait()
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 125, in wait
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     current.throw(*self._exc)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     result = function(*args, **kwargs)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 1145, in context_wrapper
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     return func(*args, **kwargs)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1587, in _allocate_network_async
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     six.reraise(*exc_info)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1570, in _allocate_network_async
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     bind_host_id=bind_host_id)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 666, in allocate_for_instance
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     self._delete_ports(neutron, instance, created_port_ids)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     self.force_reraise()
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     six.reraise(self.type_, self.value, self.tb)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 645, in allocate_for_instance
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     bind_host_id=bind_host_id)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 738, in _populate_neutron_extension_values
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     port_req_body)
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]   File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 709, in _populate_neutron_binding_profile
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]     instance, pci_request_id).pop()
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] IndexError: pop from empty list
  2016-05-13 08:25:57.937 29097 ERROR nova.compute.manager [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566]
  2016-05-13 08:25:57.939 29097 INFO nova.compute.manager [req-26138c0b-fa55-4ff8-8f3a-aad980e3c815 d864c4308b104454b7b46fb652f4f377 9322dead0b5d440986b12596d9cbff5b - - -] [instance: 4e691469-893d-4b24-a0a8-00bbee0fa566] Terminating instance

  The problem is in nova/compute/resource_tracker.py. In method
  instance_claim():

  claim = claims.Claim(context, instance_ref, self, self.compute_node,
                               overhead=overhead, limits=limits)
  if self.pci_tracker:
    self.pci_tracker.claim_instance(context, instance_ref)

    instance_ref.numa_topology = claim.claimed_numa_topology
    self._set_instance_host_and_node(instance_ref)

  1) here nova create a claim with correct NUMA node with CPU pinning and PCI devices (in our case it is node-1)
  2) nova call pci_tracker.claim_instance() with instance_ref BUT instance_ref does not contain information about needed NUMA node. That is why in claim_instance we choose node-0. In this case we can't associate requested PCI devices with the instance because these devices are associated with node-1.
  3) nova put to the instance_ref correct numa node-1 from step 1.
  4) we got an instance without PCI devices.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1582278/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list