[Bug 1518430] Re: liberty: ~busy loop on epoll_wait being called with zero timeout

Aleksandr Shaposhnikov ashaposhnikov at mirantis.com
Wed Apr 20 17:34:22 UTC 2016


I could confirm that I could observe this bug definitely on Liberty on following services: neutron-server, nova-conductor. I couldn't confirm that for other because it requires additional monitoring/debugging. 
Steps to reproduce for neutron-server:

1.Deploy new cloud.
2. Measure CPU and MEM usage by neutron-server and take a look on strace of any pid which is associated with RPC worker.
3. Create 200-400 networks with routers.
4. Spawn 10000 vms (not at once, you could just spawn and delete them by 10-20 pcs).
5. Delete everything (networks, routers, vms).
6. Measure CPU/MEM usage and take a look on strace of neutron-server RPC worker.

Expected behavior:
Everything should be the same as at step 2 with non-significant changes in CPU/MEM.

Observer behavior:
Huge CPU load (1-2 cores), huge MEMORY load (6-7Gb) without doing anything useful at the moment ;) Stracing of RPC thread will show ~1k per second of EPOLL_WAITs with 0 timeout.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1518430

Title:
  liberty: ~busy loop on epoll_wait being called with zero timeout

Status in Mirantis OpenStack:
  New
Status in OpenStack Compute (nova):
  New
Status in nova package in Ubuntu:
  Confirmed

Bug description:
  Context: openstack juju/maas deploy using 1510 charms release
  on trusty, with:
    openstack-origin: "cloud:trusty-liberty"
    source: "cloud:trusty-updates/liberty

  * Several openstack nova- and neutron- services, at least:
  nova-compute, neutron-server, nova-conductor,
  neutron-openvswitch-agent,neutron-vpn-agent
  show almost busy looping on epoll_wait() calls, with zero timeout set
  most frequently.
  - nova-compute (chose it b/cos single proc'd) strace and ltrace captures:
    http://paste.ubuntu.com/13371248/ (ltrace, strace)

  As comparison, this is how it looks on a kilo deploy:
  - http://paste.ubuntu.com/13371635/

  * 'top' sample from a nova-cloud-controller unit from
     this completely idle stack:
    http://paste.ubuntu.com/13371809/

  FYI *not* seeing this behavior on keystone, glance, cinder,
  ceilometer-api.

  As this issue is present on several components, it likely comes
  from common libraries (oslo concurrency?), fyi filed the bug to
  nova itself as a starting point for debugging.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mos/+bug/1518430/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list