[Bug 1420572] Re: [SRU] race between neutron-ovs-cleanup and nova-compute

Gustavo Randich gustavo.randich at gmail.com
Thu Sep 22 16:03:06 UTC 2016


Testing Mitaka in Ubuntu Xenial, rebooting hosts with > 30 instances, we
recently hit upon a race condition that seems similar to the one in this
issue; maybe we need a wait condition in nova-compute's systemd unit
file?


ERROR oslo_service.service [req-34d48ca5-bd93-4d10-a80a-bafad4228467 - - - - -] Error starting thread.
ERROR oslo_service.service Traceback (most recent call last):
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 680, in run_service
ERROR oslo_service.service     service.start()
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 198, in start
ERROR oslo_service.service     self.manager.init_host()
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1329, in init_host
ERROR oslo_service.service     self._init_instance(context, instance)
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1142, in _init_instance
ERROR oslo_service.service     self.driver.plug_vifs(instance, net_info)
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 880, in plug_vifs
ERROR oslo_service.service     self.vif_driver.plug(instance, vif)
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 756, in plug
ERROR oslo_service.service     func(instance, vif)
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 529, in plug_ovs
ERROR oslo_service.service     self.plug_ovs_hybrid(instance, vif)
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 525, in plug_ovs_hybrid
ERROR oslo_service.service     self._plug_bridge_with_port(instance, vif, port='ovs')
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 505, in _plug_bridge_with_port
ERROR oslo_service.service     linux_net._create_veth_pair(v1_name, v2_name, mtu)
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/network/linux_net.py", line 1356, in _create_veth_pair
ERROR oslo_service.service     _set_device_mtu(dev, mtu)
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/network/linux_net.py", line 1340, in _set_device_mtu
ERROR oslo_service.service     check_exit_code=[0, 2, 254])
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 388, in execute
ERROR oslo_service.service     return RootwrapProcessHelper().execute(*cmd, **kwargs)
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 271, in execute
ERROR oslo_service.service     return processutils.execute(*cmd, **kwargs)
ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 389, in execute
ERROR oslo_service.service     cmd=sanitized_cmd)
ERROR oslo_service.service ProcessExecutionError: Unexpected error while running command.
ERROR oslo_service.service Command: sudo nova-rootwrap /etc/nova/rootwrap.conf ip link set qvo5ab170bb-a8 mtu 8950
ERROR oslo_service.service Exit code: 1
ERROR oslo_service.service Stdout: u''
ERROR oslo_service.service Stderr: u'Cannot find device "qvo5ab170bb-a8"\n'
ERROR oslo_service.service

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1420572

Title:
  [SRU] race between neutron-ovs-cleanup and nova-compute

Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Trusty:
  Fix Released
Status in nova source package in Utopic:
  Fix Released
Status in nova source package in Vivid:
  Fix Released

Bug description:
  [Impact]

   * We run neutron-ovs-cleanup in startup if neutron installed. If
     nova-compute does not wait for completion it will try to use
     veth/bridge devices that may be in the process of bring deleted.

  [Test Case]

   * Create neutron (ovs) network and boot an instance with this network
     as --nic

   * Check that creation was successful and network is functional. Also make
     a note corresponding veth and bridge devices (ip a).

   * Reboot system, check that expected veth and bridge devices are still
     there and that nova-compute is happy e.g. try sshing to your instance.
     Also check /var/log/upstart/nova-compute.log to see if service waited
     for ovs-cleanup to finish.

  [Regression Potential]

   * None

  ---- ---- ---- ----

  There is a race when both neutron-ovs-cleanup and nova-compute trying
  to do operations on the qvb*** and qvo*** devices. Below is a scenario
  I recently met,

  1. nova-compute was started and creating the veth_pair for VM
  instances running on the host -
  https://github.com/openstack/nova/blob/stable/icehouse/nova/network/linux_net.py#L1298

  2. neutron-ovs-cleanup was kicked off and deleted all the ports.

  3. when nova-compute tried to set the MTU at
  https://github.com/openstack/nova/blob/stable/icehouse/nova/network/linux_net.py#L1280
  , Stderr: u'Cannot find device "qvo***"\n' was reported. Because the
  device that was just created was deleted again by neutron-ovs-cleanup.

  As they both operate on the same resources, there needs a way to
  synchronize the operations the two processes do on those resources.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1420572/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list