[Bug 2009138] Re: Heartbeat in pthreads still using greenthreads

OpenStack Infra 2009138 at bugs.launchpad.net
Mon Jan 29 18:56:20 UTC 2024


Reviewed:  https://review.opendev.org/c/openstack/oslo.messaging/+/880189
Committed: https://opendev.org/openstack/oslo.messaging/commit/15779aa0733f3c9bd1f85fa8aea25e3bd8915a1c
Submitter: "Zuul (22348)"
Branch:    stable/yoga

commit 15779aa0733f3c9bd1f85fa8aea25e3bd8915a1c
Author: Arnaud Morin <arnaud.morin at ovhcloud.com>
Date:   Fri Mar 3 11:16:56 2023 +0100

    Disable greenthreads for RabbitDriver "listen" connections
    
    When enabling heartbeat_in_pthread, we were restoring the "threading"
    python library from eventlet to original one in RabbitDriver but we
    forgot to do the same in AMQPDriverBase (RabbitDriver is subclass of
    AMQPDriverBase).
    
    We also need to use the original "queue" so that queues are not going to
    use greenthreads as well.
    
    Related-bug: #1961402
    Related-bug: #1934937
    Closes-bug: #2009138
    
    Signed-off-by: Arnaud Morin <arnaud.morin at ovhcloud.com>
    Change-Id: I34ea0d1381e934297df2f793e0d2594ef8254f00
    (cherry picked from commit 4d15b7c4fe0c14e285484d23c15fe5531e952679)


** Tags added: in-stable-yoga

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2009138

Title:
  Heartbeat in pthreads still using greenthreads

Status in Ubuntu Cloud Archive:
  New
Status in oslo.messaging:
  Fix Released

Bug description:
  Context
  =======
  OpenStack Yoga
  Nova API behind apache2 with mod_wsgi
  RabbitMQ 3.9.12

  Explanation
  ===========
  When using nova with apache2/mod_wsgi, we need to set 'heartbeat_in_pthread=True' to avoid using green threads (eventlet monkey patched threads).

  The python thread is mandatory to keep sending heartbeats so rabbit
  will not close the connection.

  One other option is to completely disable the heartbeats, so the
  connection will only rely on tcp keepalive. But more is better.

  The problem with the current heartbeat_in_pthread implementation is that some threads are still greenthreads.
  The result is that, some connections are correctly sending heartbeats, some others are not (and are still killed by rabbitmq after the heartbeat timeout).

  We identified that oslo_messaging is connecting to rabbit for two different purpose:
  - send
  - listen

  The current heartbeat_in_pthread=True parameter is switching heartbeat from greenthread to python thread *only for send* purpose (done in impl_rabbit.py).
  For listen purpose, the thread is created by the mother class (in amqpdriver.py), which is still using greenthreads.

  As a result, for listen purpose, rabbit connections are killed.
  We can see in rabbit logs:
  missed heartbeats from client, timeout: 60s

  We can see in nova-api logs:
  Server unexpectedly closed connection.


  How to reproduce
  ================
  Start nova-api with apache mod_wsgi and set heartbeat_in_pthread=True

  Monitor the current rabbitmq connection from nova:
  $ ss -tnep  |grep 5672

  (this can be empty if nova did nothing yet)

  Do an nova API call that needs rabbit, e.g. ask for a console url:
  $ openstack console url show 5700ecbc-adff-41d3-88a4-f24e0b885b2e

  
  This will create two connecitons:
  ESTAB 0      0        10.42.1.165:58206 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570487 sk:1a cgroup:/ <->   
  ESTAB 0      0        10.42.1.165:58204 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570486 sk:1b cgroup:/ <->   

  One is for "send" purpose, second is for "listen" purpose.

  You can also see them in rabbit logs:
  connection <0.21408.594> (10.42.1.165:58206 -> 10.42.0.21:5672 - mod_wsgi:88239:41e4b74d-c3be-47f5-8b8f-d3bd99871f46): user 'openstack' authenticated and granted access to vhost '/'
  connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2): user 'openstack' authenticated and granted access to vhost '/'

  You can also monitor the heartbeats going from/to rabbit:
  $ tcpdump -i eth0 -nn port 5672
  ...
  You will see that both connection are receiving heartbeats every 30sec, but *only one* is sending heartbeats (the one in pthread).

  
  After few minutes, rabbit is killing the "listen" connection, as seen in rabbit logs:
  2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> closing AMQP connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2):
  2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> missed heartbeats from client, timeout: 60s

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2009138/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list