[Bug 2143920] Re: [SRU] Uncaught SSL errors can crash worker threads

Fri Mar 13 01:04:08 UTC 2026

Uploaded, with rich git history (hopefully), to noble, thanks.

Uploading python-cheroot_10.0.0+ds1-1ubuntu0.1.dsc
Uploading python-cheroot_10.0.0+ds1-1ubuntu0.1.debian.tar.xz
Uploading python-cheroot_10.0.0+ds1-1ubuntu0.1_source.buildinfo
Uploading python-cheroot_10.0.0+ds1-1ubuntu0.1_source.changes

** Changed in: ceph (Ubuntu)
       Status: New => Fix Released

** Changed in: ceph (Ubuntu Jammy)
       Status: New => Invalid

** Changed in: ceph (Ubuntu Noble)
       Status: New => Invalid

** Changed in: python-cheroot (Ubuntu)
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2143920

Title:
  [SRU] Uncaught SSL errors can crash worker threads

Status in ceph package in Ubuntu:
  Fix Released
Status in python-cheroot package in Ubuntu:
  Fix Released
Status in ceph source package in Jammy:
  Invalid
Status in python-cheroot source package in Jammy:
  In Progress
Status in ceph source package in Noble:
  Invalid
Status in python-cheroot source package in Noble:
  In Progress

Bug description:
  [ Impact ]

  * This is a common problem with security scanners where all worker threads are killed by malformed requests and the server becomes unresponsive.
      - The ceph dashboard on quincy and squid is susceptible to this issue.
  * A malicious attacker could use the same technique to DOS the server.

  [ Test Plan ]

  I reproduced the issue against both a minimal cheroot server and the
  ceph dashboard.

  In both cases, I used tlsfuzzer [1] to reproduce the bug, by running
  `scripts/test-tls13-ccs.py -h <IP> -p <PORT>`.

  For cheroot
  ===========

  I did the following in an lxd container.

  1. Create a minimal cheroot server file

  server.py
  ---------
  from cheroot.wsgi import Server as WSGIServer
  from cheroot.ssl.builtin import BuiltinSSLAdapter

  def app(environ, start_response):
      status = '200 OK'
      response_headers = [('Content-type', 'text/plain')]
      start_response(status, response_headers)
      return [b"Ok."]

  server = WSGIServer(('0.0.0.0', 8443), app)

  server.ssl_adapter = BuiltinSSLAdapter(
      certificate='cert.pem',
      private_key='key.pem'
  )

  if __name__ == '__main__':
      try:
          server.start()
      except KeyboardInterrupt:
          server.stop()

  -----------

  2. Create a self-signed certificate and key

  `openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem
  -days 365 -nodes`

  3. Start the server: `sudo python3 server.py`

  4. Verify that the server responds correctly
  `curl -k -I https://<IP>:8443`

  Output
  ---------------
  HTTP/1.1 200 OK
  Content-type: text/plain
  Connection: close
  Date: Tue, 10 Mar 2026 15:44:09 GMT
  Server: Cheroot/8.5.2
  ----------------

  and check the number of worker threads

  `grep -i threads /proc/$(pgrep -f server.py)/status`

  Expected Output
  ---------------
  Threads:	11
  ---------------

  5. Run the tls13-ccs script of tlsfuzzer repeatedly until it times
  out.

  `scripts/test-tls13-ccs.py -h <IP> -p 8443`

  After several runs you will see:

  > AssertionError: Timeout when waiting for peer message

  6. Observe that connections to the server now timeout (or hang with no
  timeout specified)

  `curl -k -I https://<IP>:8443 --max-time 5`

  Expected Output
  ---------------
  HTTP/1.1 200 OK
  Content-type: text/plain
  Connection: close
  Date: Tue, 10 Mar 2026 15:44:09 GMT
  Server: Cheroot/8.5.2
  ----------------

  Actual Output
  ------------
  curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
  -------------

  and check the number of threads for server process

  `grep -i threads /proc/$(pgrep -f server.py)/status`

  Expected Output
  ---------------
  Threads:	11
  ---------------

  Actual Output
  ---------------
  Threads:	1
  ---------------

  Note that all of the worker threads have died.

  For ceph-dashboard
  ==================

  1. Deploy a minimal ceph lab on lxd [2]

  2. Add ceph-dashboard to the model [3]

  3. Note down the <IP> of one of the ceph-mon nodes which also hosts
  the dashboard.

  4. Verify that the ceph dashboard is reachable at https://<IP>:8443
  either in the browser or with curl

  curl -k -I https://<IP>:8443 --max-time 5

  Output
  ---------------
  HTTP/1.1 200 OK
  Content-Type: text/html;charset=utf-8
  Server: Ceph-Dashboard
  Date: Tue, 10 Mar 2026 15:55:47 GMT
  Content-Security-Policy: frame-ancestors 'self';
  X-Content-Type-Options: nosniff
  Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
  Content-Language: en-US
  Vary: Accept-Language, Accept-Encoding
  Cache-Control: no-cache
  Last-Modified: Fri, 12 Jul 2024 14:10:44 GMT
  Accept-Ranges: bytes
  Content-Length: 6466
  ---------------

  5. Run the tls13-ccs script of tlsfuzzer repeatedly until it times
  out.

  `scripts/test-tls13-ccs.py -h <IP> -p 8443`

  After several runs you will see:

  > AssertionError: Timeout when waiting for peer message

  6. The ceph-dashboard is now unreachable from the browser and curl

  curl -k -I https://<IP>:8443 --max-time 5

  Expected Output
  ---------------
  HTTP/1.1 200 OK
  Content-Type: text/html;charset=utf-8
  Server: Ceph-Dashboard
  Date: Tue, 10 Mar 2026 15:55:47 GMT
  Content-Security-Policy: frame-ancestors 'self';
  X-Content-Type-Options: nosniff
  Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
  Content-Language: en-US
  Vary: Accept-Language, Accept-Encoding
  Cache-Control: no-cache
  Last-Modified: Fri, 12 Jul 2024 14:10:44 GMT
  Accept-Ranges: bytes
  Content-Length: 6466
  ---------------

  Actual Output
  -------------
  curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
  -------------

  7. Read the syslog on mon unit and observe uncaught exceptions in the
  cheroot server threads

  e.g., sudo grep "Thread" -A10 /var/log/syslog

  Expected Output: <No Thread Errors>

  Actual Output
  --------
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: Exception in thread ('CP Server Thread-11',):
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: Traceback (most recent call last):
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]:   File "/lib/python3/dist-packages/cheroot/server.py", line 1277, in communicate
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]:     req.parse_request()
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]:   File "/lib/python3/dist-packages/cheroot/server.py", line 706, in parse_request
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]:     success = self.read_request_line()
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]:   File "/lib/python3/dist-packages/cheroot/server.py", line 747, in read_request_line
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]:     request_line = self.rfile.readline()
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]:   File "/lib/python3/dist-packages/cheroot/server.py", line 304, in readline
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]:     data = self.rfile.readline(256)
  Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]:   File "/lib/python3.10/_pyio.py", line 582, in readline
  ---------

  Fix Verification
  ================

  As an extra verification of the fixes in the SRU we can run the whole
  tlsfuzzer suite against the patched server and verify that the number
  of worker threads remains the same. This is a good indicator that
  there are no obvious additional vulnerabilities that could crash the
  worker thread.

  1. Perform steps 1-4 of the "For cheroot" test plan above.
  2. From the tlsfuzzer directory run all scripts targeting the server (some of these will error out because they are missing required arguments but this does no harm)

  `for s in scripts/test-*.py; do $s -h <IP> -p 8443; done`

  NOTE: the server will log many errors, but none should crash the
  worker threads

  3. Verify that the number of threads is the same as before running the
  suite

  `grep -i threads /proc/$(pgrep -f server.py)/status`

  Expected Output
  ---------------
  Threads:	11
  ---------------

  [ Where problems could occur ]

  * Because we are now swallowing errors, there is the potential for
  threads to be left in a bad state when they would have previously
  crashed.

  * The smaller patch will not necessarily catch all exception types and
  so would leave some errors of this kind, although solving the specific
  incarnation.

  [ Other Info ]

  * This issue was fixed in upstream version 10.0.1, which is in questing and above.
  * Upstream ceph has fixed this issue in reef+[4] by bumping the cheroot dependency to 10.0.1.
  * There are two variants of an upstream patch
      - One simply catches and logs SSL errors in the threadpool [5]. This patch was proposed but not merged.
      - The other is a more holistic revisiting of the error handling in the threadpool [6], and is the patch that landed in 10.0.1.
  * I have cherry-picked the latter patch, which was applied upstream, in my merge requests.

  [1]: https://github.com/tlsfuzzer/tlsfuzzer
  [2]: https://ubuntu.com/ceph/docs/tutorial
  [3]: https://ubuntu.com/ceph/docs/install-dashboard
  [4]: https://github.com/ceph/ceph/pull/57001
  [5]: https://github.com/cherrypy/cheroot/pull/365
  [6]: https://github.com/cherrypy/cheroot/pull/649

  Related Upstream Issues
  -----------------------
  https://github.com/cherrypy/cherrypy/issues/1989
  https://github.com/cherrypy/cheroot/issues/358

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2143920/+subscriptions