[Bug 2143920] Re: [SRU] Uncaught SSL errors can crash worker threads
Matthew Ruffell
2143920 at bugs.launchpad.net
Fri Mar 13 01:04:08 UTC 2026
Uploaded, with rich git history (hopefully), to noble, thanks.
Uploading python-cheroot_10.0.0+ds1-1ubuntu0.1.dsc
Uploading python-cheroot_10.0.0+ds1-1ubuntu0.1.debian.tar.xz
Uploading python-cheroot_10.0.0+ds1-1ubuntu0.1_source.buildinfo
Uploading python-cheroot_10.0.0+ds1-1ubuntu0.1_source.changes
** Changed in: ceph (Ubuntu)
Status: New => Fix Released
** Changed in: ceph (Ubuntu Jammy)
Status: New => Invalid
** Changed in: ceph (Ubuntu Noble)
Status: New => Invalid
** Changed in: python-cheroot (Ubuntu)
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2143920
Title:
[SRU] Uncaught SSL errors can crash worker threads
Status in ceph package in Ubuntu:
Fix Released
Status in python-cheroot package in Ubuntu:
Fix Released
Status in ceph source package in Jammy:
Invalid
Status in python-cheroot source package in Jammy:
In Progress
Status in ceph source package in Noble:
Invalid
Status in python-cheroot source package in Noble:
In Progress
Bug description:
[ Impact ]
* This is a common problem with security scanners where all worker threads are killed by malformed requests and the server becomes unresponsive.
- The ceph dashboard on quincy and squid is susceptible to this issue.
* A malicious attacker could use the same technique to DOS the server.
[ Test Plan ]
I reproduced the issue against both a minimal cheroot server and the
ceph dashboard.
In both cases, I used tlsfuzzer [1] to reproduce the bug, by running
`scripts/test-tls13-ccs.py -h <IP> -p <PORT>`.
For cheroot
===========
I did the following in an lxd container.
1. Create a minimal cheroot server file
server.py
---------
from cheroot.wsgi import Server as WSGIServer
from cheroot.ssl.builtin import BuiltinSSLAdapter
def app(environ, start_response):
status = '200 OK'
response_headers = [('Content-type', 'text/plain')]
start_response(status, response_headers)
return [b"Ok."]
server = WSGIServer(('0.0.0.0', 8443), app)
server.ssl_adapter = BuiltinSSLAdapter(
certificate='cert.pem',
private_key='key.pem'
)
if __name__ == '__main__':
try:
server.start()
except KeyboardInterrupt:
server.stop()
-----------
2. Create a self-signed certificate and key
`openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem
-days 365 -nodes`
3. Start the server: `sudo python3 server.py`
4. Verify that the server responds correctly
`curl -k -I https://<IP>:8443`
Output
---------------
HTTP/1.1 200 OK
Content-type: text/plain
Connection: close
Date: Tue, 10 Mar 2026 15:44:09 GMT
Server: Cheroot/8.5.2
----------------
and check the number of worker threads
`grep -i threads /proc/$(pgrep -f server.py)/status`
Expected Output
---------------
Threads: 11
---------------
5. Run the tls13-ccs script of tlsfuzzer repeatedly until it times
out.
`scripts/test-tls13-ccs.py -h <IP> -p 8443`
After several runs you will see:
> AssertionError: Timeout when waiting for peer message
6. Observe that connections to the server now timeout (or hang with no
timeout specified)
`curl -k -I https://<IP>:8443 --max-time 5`
Expected Output
---------------
HTTP/1.1 200 OK
Content-type: text/plain
Connection: close
Date: Tue, 10 Mar 2026 15:44:09 GMT
Server: Cheroot/8.5.2
----------------
Actual Output
------------
curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
-------------
and check the number of threads for server process
`grep -i threads /proc/$(pgrep -f server.py)/status`
Expected Output
---------------
Threads: 11
---------------
Actual Output
---------------
Threads: 1
---------------
Note that all of the worker threads have died.
For ceph-dashboard
==================
1. Deploy a minimal ceph lab on lxd [2]
2. Add ceph-dashboard to the model [3]
3. Note down the <IP> of one of the ceph-mon nodes which also hosts
the dashboard.
4. Verify that the ceph dashboard is reachable at https://<IP>:8443
either in the browser or with curl
curl -k -I https://<IP>:8443 --max-time 5
Output
---------------
HTTP/1.1 200 OK
Content-Type: text/html;charset=utf-8
Server: Ceph-Dashboard
Date: Tue, 10 Mar 2026 15:55:47 GMT
Content-Security-Policy: frame-ancestors 'self';
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
Content-Language: en-US
Vary: Accept-Language, Accept-Encoding
Cache-Control: no-cache
Last-Modified: Fri, 12 Jul 2024 14:10:44 GMT
Accept-Ranges: bytes
Content-Length: 6466
---------------
5. Run the tls13-ccs script of tlsfuzzer repeatedly until it times
out.
`scripts/test-tls13-ccs.py -h <IP> -p 8443`
After several runs you will see:
> AssertionError: Timeout when waiting for peer message
6. The ceph-dashboard is now unreachable from the browser and curl
curl -k -I https://<IP>:8443 --max-time 5
Expected Output
---------------
HTTP/1.1 200 OK
Content-Type: text/html;charset=utf-8
Server: Ceph-Dashboard
Date: Tue, 10 Mar 2026 15:55:47 GMT
Content-Security-Policy: frame-ancestors 'self';
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
Content-Language: en-US
Vary: Accept-Language, Accept-Encoding
Cache-Control: no-cache
Last-Modified: Fri, 12 Jul 2024 14:10:44 GMT
Accept-Ranges: bytes
Content-Length: 6466
---------------
Actual Output
-------------
curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
-------------
7. Read the syslog on mon unit and observe uncaught exceptions in the
cheroot server threads
e.g., sudo grep "Thread" -A10 /var/log/syslog
Expected Output: <No Thread Errors>
Actual Output
--------
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: Exception in thread ('CP Server Thread-11',):
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: Traceback (most recent call last):
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: File "/lib/python3/dist-packages/cheroot/server.py", line 1277, in communicate
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: req.parse_request()
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: File "/lib/python3/dist-packages/cheroot/server.py", line 706, in parse_request
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: success = self.read_request_line()
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: File "/lib/python3/dist-packages/cheroot/server.py", line 747, in read_request_line
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: request_line = self.rfile.readline()
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: File "/lib/python3/dist-packages/cheroot/server.py", line 304, in readline
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: data = self.rfile.readline(256)
Mar 10 15:57:31 juju-73b987-2 ceph-mgr[65287]: File "/lib/python3.10/_pyio.py", line 582, in readline
---------
Fix Verification
================
As an extra verification of the fixes in the SRU we can run the whole
tlsfuzzer suite against the patched server and verify that the number
of worker threads remains the same. This is a good indicator that
there are no obvious additional vulnerabilities that could crash the
worker thread.
1. Perform steps 1-4 of the "For cheroot" test plan above.
2. From the tlsfuzzer directory run all scripts targeting the server (some of these will error out because they are missing required arguments but this does no harm)
`for s in scripts/test-*.py; do $s -h <IP> -p 8443; done`
NOTE: the server will log many errors, but none should crash the
worker threads
3. Verify that the number of threads is the same as before running the
suite
`grep -i threads /proc/$(pgrep -f server.py)/status`
Expected Output
---------------
Threads: 11
---------------
[ Where problems could occur ]
* Because we are now swallowing errors, there is the potential for
threads to be left in a bad state when they would have previously
crashed.
* The smaller patch will not necessarily catch all exception types and
so would leave some errors of this kind, although solving the specific
incarnation.
[ Other Info ]
* This issue was fixed in upstream version 10.0.1, which is in questing and above.
* Upstream ceph has fixed this issue in reef+[4] by bumping the cheroot dependency to 10.0.1.
* There are two variants of an upstream patch
- One simply catches and logs SSL errors in the threadpool [5]. This patch was proposed but not merged.
- The other is a more holistic revisiting of the error handling in the threadpool [6], and is the patch that landed in 10.0.1.
* I have cherry-picked the latter patch, which was applied upstream, in my merge requests.
[1]: https://github.com/tlsfuzzer/tlsfuzzer
[2]: https://ubuntu.com/ceph/docs/tutorial
[3]: https://ubuntu.com/ceph/docs/install-dashboard
[4]: https://github.com/ceph/ceph/pull/57001
[5]: https://github.com/cherrypy/cheroot/pull/365
[6]: https://github.com/cherrypy/cheroot/pull/649
Related Upstream Issues
-----------------------
https://github.com/cherrypy/cherrypy/issues/1989
https://github.com/cherrypy/cheroot/issues/358
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2143920/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list