[Bug 1518430] Re: liberty: ~busy loop on epoll_wait being called with zero timeout
Aleksandr Shaposhnikov
ashaposhnikov at mirantis.com
Wed Apr 20 17:34:22 UTC 2016
I could confirm that I could observe this bug definitely on Liberty on following services: neutron-server, nova-conductor. I couldn't confirm that for other because it requires additional monitoring/debugging.
Steps to reproduce for neutron-server:
1.Deploy new cloud.
2. Measure CPU and MEM usage by neutron-server and take a look on strace of any pid which is associated with RPC worker.
3. Create 200-400 networks with routers.
4. Spawn 10000 vms (not at once, you could just spawn and delete them by 10-20 pcs).
5. Delete everything (networks, routers, vms).
6. Measure CPU/MEM usage and take a look on strace of neutron-server RPC worker.
Expected behavior:
Everything should be the same as at step 2 with non-significant changes in CPU/MEM.
Observer behavior:
Huge CPU load (1-2 cores), huge MEMORY load (6-7Gb) without doing anything useful at the moment ;) Stracing of RPC thread will show ~1k per second of EPOLL_WAITs with 0 timeout.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1518430
Title:
liberty: ~busy loop on epoll_wait being called with zero timeout
Status in Mirantis OpenStack:
New
Status in OpenStack Compute (nova):
New
Status in nova package in Ubuntu:
Confirmed
Bug description:
Context: openstack juju/maas deploy using 1510 charms release
on trusty, with:
openstack-origin: "cloud:trusty-liberty"
source: "cloud:trusty-updates/liberty
* Several openstack nova- and neutron- services, at least:
nova-compute, neutron-server, nova-conductor,
neutron-openvswitch-agent,neutron-vpn-agent
show almost busy looping on epoll_wait() calls, with zero timeout set
most frequently.
- nova-compute (chose it b/cos single proc'd) strace and ltrace captures:
http://paste.ubuntu.com/13371248/ (ltrace, strace)
As comparison, this is how it looks on a kilo deploy:
- http://paste.ubuntu.com/13371635/
* 'top' sample from a nova-cloud-controller unit from
this completely idle stack:
http://paste.ubuntu.com/13371809/
FYI *not* seeing this behavior on keystone, glance, cinder,
ceilometer-api.
As this issue is present on several components, it likely comes
from common libraries (oslo concurrency?), fyi filed the bug to
nova itself as a starting point for debugging.
To manage notifications about this bug go to:
https://bugs.launchpad.net/mos/+bug/1518430/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list