[Bug 2119987] Re: haproxy reload triggers OOM-killer for TERMINATED_HTTPS loadbalancers
Wesley Hershberger
2119987 at bugs.launchpad.net
Fri Jan 16 17:45:18 UTC 2026
Plucky Puffin is EOL.
** Description changed:
[ Impact ]
Creating a TERMINATED_HTTPS listener in an amphora with >=32GB of memory
triggers the OOM-killer during listener startup (and any subsequent
`systemctl reload` of haproxy in the amphora).
```
os loadbalancer listener create --name thttps_xxlarge --protocol TERMINATED_HTTPS --protocol-port 443 --default-tls-container-ref <URL> --wait xxlarge1
```
This was originally reported in a Caracal cloud using an Ubuntu 22.04
Amphora image.
I've been able to reproduce this reliably in my lab using the latest
devstack and an Ubuntu 24.04 Amphora image.
Workaround by setting a higher connection limit on one listener in
proportion to the 50000 default and the memory on the system. So for an
amphora with 32GiB of RAM, use --connection-limit 200000 for one
listener.
+
+ [ Test Plan ]
+
+ Deploy charmed Jammy/Caracal with Octavia and Barbican. For Noble
+ verification, DRU the Octavia unit to Noble before completing the rest
+ of the test plan.
+
+ Get a compute flavor id to use for the Octavia flavorprofile:
+
+ $ openstack flavor create --ram 32768 --disk 10 --vcpus 4 --id 10 m1.xxlarge
+ +----------------------------+------------+
+ | Field | Value |
+ +----------------------------+------------+
+ | OS-FLV-DISABLED:disabled | False |
+ | OS-FLV-EXT-DATA:ephemeral | 0 |
+ | description | None |
+ | disk | 10 |
+ | id | 10 |
+ | name | m1.xxlarge |
+ | os-flavor-access:is_public | True |
+ | properties | |
+ | ram | 32768 |
+ | rxtx_factor | 1.0 |
+ | swap | 0 |
+ | vcpus | 4 |
+ +----------------------------+------------+
+
+ Create the flavor profile:
+
+ $ openstack loadbalancer flavorprofile create \
+ --name o1.xxlarge \
+ --provider amphora \
+ --flavor-data '{"compute_flavor": "10"}'
+ +---------------+--------------------------------------+
+ | Field | Value |
+ +---------------+--------------------------------------+
+ | id | f3aac848-7e77-449a-96af-bf0312c45ef9 |
+ | name | o1.xxlarge |
+ | provider_name | amphora |
+ | flavor_data | {"compute_flavor": "10"} |
+ +---------------+--------------------------------------+
+
+ And Amphora flavor:
+
+ $ openstack loadbalancer flavor create \
+ --name o1.xxlarge \
+ --flavorprofile o1.xxlarge \
+ --description "Extra large LB" \
+ --enable
+ +-------------------+--------------------------------------+
+ | Field | Value |
+ +-------------------+--------------------------------------+
+ | id | 350f607c-9189-49ac-a585-84960322137a |
+ | name | o1.xxlarge |
+ | flavor_profile_id | f3aac848-7e77-449a-96af-bf0312c45ef9 |
+ | enabled | True |
+ | description | Extra large LB |
+ +-------------------+--------------------------------------+
+
+ For certificates I created a complete PKI following Jamie's Guide [1];
+ could use a self-signed cert. Attaching xxlarge.devstack.p12 for
+ convenience (as this test plan does not require the cert to function).
+
+ $ openssl pkcs12 -export \
+ -inkey xxlarge.devstack.key.pem \
+ -in xxlarge.devstack.cert.pem \
+ -certfile ca-chain.cert.pem \
+ -passout pass: -out xxlarge.devstack.p12
+ $ openstack secret store \
+ --name="xxlarge.devstack.p12" \
+ -t 'application/octet-stream' \
+ -e 'base64' \
+ --payload="$(base64 < xxlarge.devstack.p12)"
+ +---------------+-------------------------------------------------------------------+
+ | Field | Value |
+ +---------------+-------------------------------------------------------------------+
+ | Secret href | https://None:9312/v1/secrets/86f4113d-e9f6-43e3-aa10-459a6acc14b3 |
+ | Name | xxlarge.devstack.p12 |
+ | Created | None |
+ | Status | None |
+ | Content types | {'default': 'application/octet-stream'} |
+ | Algorithm | aes |
+ | Bit length | 256 |
+ | Secret type | opaque |
+ | Mode | cbc |
+ | Expiration | None |
+ +---------------+-------------------------------------------------------------------+
+
+ Create an LB (provisions the Amphora):
+
+ $ openstack loadbalancer create --wait \
+ --name lb1-xxlarge \
+ --vip-subnet-id ext_net_subnet \
+ --flavor o1.xxlarge
+
+ Create the listener (this configures haproxy in the Amphora):
+
+ $ openstack loadbalancer listener create --wait \
+ --name thttps \
+ --protocol TERMINATED_HTTPS \
+ --protocol-port 443 \
+ --default-tls-container-ref https://None:9312/v1/secrets/86f4113d-e9f6-43e3-aa10-459a6acc14b3 \
+ lb1-xxlarge
+
+ Expected behavior:
+
+ Success
+
+ Actual behavior:
+
+ The resource did not successfully reach ACTIVE status. (HTTP n/a)
+ (Request-ID: None)
+
+ [1] https://jamielinux.com/docs/openssl-certificate-
+ authority/introduction.html
+
+ [ Where problems could occur ]
+
+ The patch affects only the code that produces the `ssl.tune.cachesize`
+ haproxy configuration option. `ssl.tune.cachesize` is only included in
+ the haproxy configuration for listeners with protocol TERMINATED_HTTPS.
+
+ If the change is wrong, we'd expect to see failures related memory
+ consumption in Amphora that host TERMINATED_HTTPS listeners; most likely
+ these would be OOM-killer invocations visible on the system log, but
+ issues could also manifest as haproxy failing as a result of invalid
+ configuration.
+
+ Under very specific circumstances (recurring SSL connections balanced
+ well with the existing `ssl.tune.cachesize` on a TERMINATED_HTTPS
+ loadbalancer) it's also possible that this change could moderately
+ regress performance. A performance regression of this nature can be
+ prevented by provisioning an Amphora with more available memory.
+
[ Root Cause ]
454cff5 (in Zed+ IIUC) introduces the use of haproxy's
`tune.ssl.cachesize` for TERMINATED_HTTPS listeners [1][2].
The commit does not make clear that during a reload of haproxy
(SIGUSR2), the old worker process stays running until the new worker
process is ready [3][4]. This means that two TLS session caches are
allocated/held simultaneously during a reload of the service [5].
For small Amphorae, this works fine. The default connection limit is
50000, which takes enough of a chunk out of the 50% allocation that
there is enough wiggle room for the new haproxy worker to allocate its
cache and coexist with the old worker for some time.
However, as the available memory in the system increases, the memory
consumed by the session cache approaches 50%, and increases the worker's
memory usage beyond 50% (as something else in the worker is also using
memory in proportion to the configured cachesize).
I tested 10 values of tune.ssl.cachesize in an amphora with 32GiB of
RAM, reloading the haproxy service each time:
- vsz here is the value reported by `ps -ax -o pid,vsz,rss,uss,pmem,args | grep haproxy`
- overhead is `tune.ssl.cachesize_MiB - vsz_MiB - 261`
- overhead% is `floor((overhead / tune.ssl.cachesize_MiB) * 100)`
tune.ssl.cachesize | tune.ssl.cachesize_MiB | vsz | vsz_MiB | overhead | overhead%
0 | 0 | 267416 | 261 | 0 | 0%
7741606 | 1476 | 2142472 | 2092 | 355 | 24%
15483212 | 2953 | 4017260 | 3923 | 709 | 24%
23224818 | 4429 | 5892180 | 5754 | 1064 | 24%
30966424 | 5906 | 7767100 | 7585 | 1418 | 24%
38708030 | 7382 | 9642020 | 9416 | 1773 | 24%
46449636 | 8859 | 11516940 | 11247 | 2127 | 24%
54191242 | 10336 | 13391860 | 13077 | 2480 | 23%
61932848 | 11812 | 15266780 | 14908 | 2835 | 24%
69674454 | 13289 | 17141700 | 16739 | 3189 | 23%
77416060 | 14765 | 19016744 | 18571 | 3545 | 24%
Note that this listener was not configured with a pool, so there was no
load on the system when I gathered this data.
As shown, haproxy to consumes additional memory proportional to the size
of the TLS session cache. The allocation for the cache occurs at [6],
referring to [7].
I verified the documentation's assertion that tune.ssl.cachesize is 200
bytes on amd64; sizeof(struct shared_block) is 48 bytes on the same
hardware [8].
Octavia should allocate closer to 1/3 than 1/2 for the TLS session
cache. I'll test and propose a patch against master shortly.
[1] https://opendev.org/openstack/octavia/commit/454cff587ed10b5e504da93b074b77cb85055b13
[2] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/2-8r1/#section-3.2.-tunesslcachesize
[3] https://github.com/haproxy/haproxy/issues/217#issuecomment-544515990
[4] https://manpages.ubuntu.com/manpages/jammy/en/man1/haproxy.1.html
[5] https://opendev.org/openstack/octavia/src/branch/master/octavia/amphorae/backends/agent/api_server/templates/systemd.conf.j2
[6] https://git.launchpad.net/ubuntu/+source/haproxy/tree/src/ssl_sock.c?h=applied/ubuntu/noble-devel#n5346
[7] https://git.launchpad.net/ubuntu/+source/haproxy/tree/src/shctx.c?h=applied/ubuntu/noble-devel#n300
[8] https://git.launchpad.net/ubuntu/+source/haproxy/tree/include/haproxy/shctx-t.h?h=applied/ubuntu/noble-devel#n38
** Attachment added: "xxlarge.devstack.p12"
https://bugs.launchpad.net/octavia/+bug/2119987/+attachment/5939549/+files/xxlarge.devstack.p12
** Changed in: octavia (Ubuntu Plucky)
Status: Triaged => Won't Fix
** Changed in: octavia (Ubuntu Noble)
Assignee: Jorge Merlino (jorge-merlino) => Wesley Hershberger (whershberger)
** Changed in: octavia (Ubuntu Noble)
Status: Triaged => In Progress
** Changed in: cloud-archive/caracal
Status: Fix Released => In Progress
** Changed in: cloud-archive/flamingo
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2119987
Title:
haproxy reload triggers OOM-killer for TERMINATED_HTTPS loadbalancers
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive caracal series:
In Progress
Status in Ubuntu Cloud Archive epoxy series:
Fix Released
Status in Ubuntu Cloud Archive flamingo series:
Fix Released
Status in octavia:
Fix Released
Status in octavia package in Ubuntu:
Fix Released
Status in octavia source package in Noble:
In Progress
Status in octavia source package in Plucky:
Won't Fix
Status in octavia source package in Questing:
Fix Released
Bug description:
[ Impact ]
Creating a TERMINATED_HTTPS listener in an amphora with >=32GB of
memory triggers the OOM-killer during listener startup (and any
subsequent `systemctl reload` of haproxy in the amphora).
```
os loadbalancer listener create --name thttps_xxlarge --protocol TERMINATED_HTTPS --protocol-port 443 --default-tls-container-ref <URL> --wait xxlarge1
```
This was originally reported in a Caracal cloud using an Ubuntu 22.04
Amphora image.
I've been able to reproduce this reliably in my lab using the latest
devstack and an Ubuntu 24.04 Amphora image.
Workaround by setting a higher connection limit on one listener in
proportion to the 50000 default and the memory on the system. So for
an amphora with 32GiB of RAM, use --connection-limit 200000 for one
listener.
[ Test Plan ]
Deploy charmed Jammy/Caracal with Octavia and Barbican. For Noble
verification, DRU the Octavia unit to Noble before completing the rest
of the test plan.
Get a compute flavor id to use for the Octavia flavorprofile:
$ openstack flavor create --ram 32768 --disk 10 --vcpus 4 --id 10 m1.xxlarge
+----------------------------+------------+
| Field | Value |
+----------------------------+------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| description | None |
| disk | 10 |
| id | 10 |
| name | m1.xxlarge |
| os-flavor-access:is_public | True |
| properties | |
| ram | 32768 |
| rxtx_factor | 1.0 |
| swap | 0 |
| vcpus | 4 |
+----------------------------+------------+
Create the flavor profile:
$ openstack loadbalancer flavorprofile create \
--name o1.xxlarge \
--provider amphora \
--flavor-data '{"compute_flavor": "10"}'
+---------------+--------------------------------------+
| Field | Value |
+---------------+--------------------------------------+
| id | f3aac848-7e77-449a-96af-bf0312c45ef9 |
| name | o1.xxlarge |
| provider_name | amphora |
| flavor_data | {"compute_flavor": "10"} |
+---------------+--------------------------------------+
And Amphora flavor:
$ openstack loadbalancer flavor create \
--name o1.xxlarge \
--flavorprofile o1.xxlarge \
--description "Extra large LB" \
--enable
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| id | 350f607c-9189-49ac-a585-84960322137a |
| name | o1.xxlarge |
| flavor_profile_id | f3aac848-7e77-449a-96af-bf0312c45ef9 |
| enabled | True |
| description | Extra large LB |
+-------------------+--------------------------------------+
For certificates I created a complete PKI following Jamie's Guide [1];
could use a self-signed cert. Attaching xxlarge.devstack.p12 for
convenience (as this test plan does not require the cert to function).
$ openssl pkcs12 -export \
-inkey xxlarge.devstack.key.pem \
-in xxlarge.devstack.cert.pem \
-certfile ca-chain.cert.pem \
-passout pass: -out xxlarge.devstack.p12
$ openstack secret store \
--name="xxlarge.devstack.p12" \
-t 'application/octet-stream' \
-e 'base64' \
--payload="$(base64 < xxlarge.devstack.p12)"
+---------------+-------------------------------------------------------------------+
| Field | Value |
+---------------+-------------------------------------------------------------------+
| Secret href | https://None:9312/v1/secrets/86f4113d-e9f6-43e3-aa10-459a6acc14b3 |
| Name | xxlarge.devstack.p12 |
| Created | None |
| Status | None |
| Content types | {'default': 'application/octet-stream'} |
| Algorithm | aes |
| Bit length | 256 |
| Secret type | opaque |
| Mode | cbc |
| Expiration | None |
+---------------+-------------------------------------------------------------------+
Create an LB (provisions the Amphora):
$ openstack loadbalancer create --wait \
--name lb1-xxlarge \
--vip-subnet-id ext_net_subnet \
--flavor o1.xxlarge
Create the listener (this configures haproxy in the Amphora):
$ openstack loadbalancer listener create --wait \
--name thttps \
--protocol TERMINATED_HTTPS \
--protocol-port 443 \
--default-tls-container-ref https://None:9312/v1/secrets/86f4113d-e9f6-43e3-aa10-459a6acc14b3 \
lb1-xxlarge
Expected behavior:
Success
Actual behavior:
The resource did not successfully reach ACTIVE status. (HTTP n/a)
(Request-ID: None)
[1] https://jamielinux.com/docs/openssl-certificate-
authority/introduction.html
[ Where problems could occur ]
The patch affects only the code that produces the `ssl.tune.cachesize`
haproxy configuration option. `ssl.tune.cachesize` is only included in
the haproxy configuration for listeners with protocol
TERMINATED_HTTPS.
If the change is wrong, we'd expect to see failures related memory
consumption in Amphora that host TERMINATED_HTTPS listeners; most
likely these would be OOM-killer invocations visible on the system
log, but issues could also manifest as haproxy failing as a result of
invalid configuration.
Under very specific circumstances (recurring SSL connections balanced
well with the existing `ssl.tune.cachesize` on a TERMINATED_HTTPS
loadbalancer) it's also possible that this change could moderately
regress performance. A performance regression of this nature can be
prevented by provisioning an Amphora with more available memory.
[ Root Cause ]
454cff5 (in Zed+ IIUC) introduces the use of haproxy's
`tune.ssl.cachesize` for TERMINATED_HTTPS listeners [1][2].
The commit does not make clear that during a reload of haproxy
(SIGUSR2), the old worker process stays running until the new worker
process is ready [3][4]. This means that two TLS session caches are
allocated/held simultaneously during a reload of the service [5].
For small Amphorae, this works fine. The default connection limit is
50000, which takes enough of a chunk out of the 50% allocation that
there is enough wiggle room for the new haproxy worker to allocate its
cache and coexist with the old worker for some time.
However, as the available memory in the system increases, the memory
consumed by the session cache approaches 50%, and increases the
worker's memory usage beyond 50% (as something else in the worker is
also using memory in proportion to the configured cachesize).
I tested 10 values of tune.ssl.cachesize in an amphora with 32GiB of
RAM, reloading the haproxy service each time:
- vsz here is the value reported by `ps -ax -o pid,vsz,rss,uss,pmem,args | grep haproxy`
- overhead is `tune.ssl.cachesize_MiB - vsz_MiB - 261`
- overhead% is `floor((overhead / tune.ssl.cachesize_MiB) * 100)`
tune.ssl.cachesize | tune.ssl.cachesize_MiB | vsz | vsz_MiB | overhead | overhead%
0 | 0 | 267416 | 261 | 0 | 0%
7741606 | 1476 | 2142472 | 2092 | 355 | 24%
15483212 | 2953 | 4017260 | 3923 | 709 | 24%
23224818 | 4429 | 5892180 | 5754 | 1064 | 24%
30966424 | 5906 | 7767100 | 7585 | 1418 | 24%
38708030 | 7382 | 9642020 | 9416 | 1773 | 24%
46449636 | 8859 | 11516940 | 11247 | 2127 | 24%
54191242 | 10336 | 13391860 | 13077 | 2480 | 23%
61932848 | 11812 | 15266780 | 14908 | 2835 | 24%
69674454 | 13289 | 17141700 | 16739 | 3189 | 23%
77416060 | 14765 | 19016744 | 18571 | 3545 | 24%
Note that this listener was not configured with a pool, so there was
no load on the system when I gathered this data.
As shown, haproxy to consumes additional memory proportional to the
size of the TLS session cache. The allocation for the cache occurs at
[6], referring to [7].
I verified the documentation's assertion that tune.ssl.cachesize is
200 bytes on amd64; sizeof(struct shared_block) is 48 bytes on the
same hardware [8].
Octavia should allocate closer to 1/3 than 1/2 for the TLS session
cache. I'll test and propose a patch against master shortly.
[1] https://opendev.org/openstack/octavia/commit/454cff587ed10b5e504da93b074b77cb85055b13
[2] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/2-8r1/#section-3.2.-tunesslcachesize
[3] https://github.com/haproxy/haproxy/issues/217#issuecomment-544515990
[4] https://manpages.ubuntu.com/manpages/jammy/en/man1/haproxy.1.html
[5] https://opendev.org/openstack/octavia/src/branch/master/octavia/amphorae/backends/agent/api_server/templates/systemd.conf.j2
[6] https://git.launchpad.net/ubuntu/+source/haproxy/tree/src/ssl_sock.c?h=applied/ubuntu/noble-devel#n5346
[7] https://git.launchpad.net/ubuntu/+source/haproxy/tree/src/shctx.c?h=applied/ubuntu/noble-devel#n300
[8] https://git.launchpad.net/ubuntu/+source/haproxy/tree/include/haproxy/shctx-t.h?h=applied/ubuntu/noble-devel#n38
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2119987/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list