[Bug 2148798] Re: gnocchi-metricd enters permanent crash loop on LZ4BlockError from truncated Carbonara object
Launchpad Bug Tracker
2148798 at bugs.launchpad.net
Thu May 21 10:18:10 UTC 2026
** Merge proposal linked:
https://code.launchpad.net/~seyeongkim/ubuntu/+source/gnocchi/+git/gnocchi/+merge/505220
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2148798
Title:
gnocchi-metricd enters permanent crash loop on LZ4BlockError from
truncated Carbonara object
Status in Ubuntu Cloud Archive:
New
Status in Ubuntu Cloud Archive epoxy series:
In Progress
Status in gnocchi package in Ubuntu:
In Progress
Status in gnocchi source package in Jammy:
In Progress
Status in gnocchi source package in Noble:
In Progress
Status in gnocchi source package in Questing:
In Progress
Status in gnocchi source package in Resolute:
In Progress
Bug description:
[Impact]
gnocchi-metricd workers enter a permanent crash loop when a Carbonara object in the Ceph RADOS storage pool is truncated (e.g. when metricd is OOM-killed or restarted mid-write). On the next cycle unserialize() calls lz4.block.decompress() on the short buffer.
This raises lz4.block.LZ4BlockError, which is a direct subclass of Exception (not ValueError), so the existing "except ValueError" guard in carbonara.py does not catch it and the worker crashes.
metricd retries the same object every cycle, so a single truncated object blocks the entire telemetry pipeline indefinitely.
Also uploaded PR for github.
https://github.com/gnocchixyz/gnocchi/issues/1348
[Test case]
On a Jammy,Noble,Questing container with python3-gnocchi and
python3-lz4 installed:
import lz4.block, os
from gnocchi import carbonara
data = b"c" + lz4.block.compress(os.urandom(16384))
truncated = data[:20]
try:
carbonara.AggregatedTimeSerie.unserialize(
truncated, key=None, aggregation=None)
except lz4.block.LZ4BlockError as e:
print("BUG:", e)
except carbonara.InvalidData as e:
print("FIXED:", e)
Before patch
prints "BUG: Decompression failed: ..." with a traceback into
carbonara.py AggregatedTimeSerie.unserialize().
After patch
prints "FIXED: Unable to unpack, invalid data".
Repeat the same test against carbonara.BoundTimeSerie.unserialize() (data passed directly, no marker byte).
It must also transition from BUG to FIXED.
[Where problems could occur]
The fix wraps lz4.block.decompress() in both unserialize() methods with "except lz4.block.LZ4BlockError: raise InvalidData".
InvalidData is already the sentinel used by the surrounding numpy.frombuffer ValueError branch, so no caller changes are needed.
metricd already skips metrics that raise InvalidData.
The only new behavior is that a non-truncation corruption (bitflip,
disk error) is now silently skipped instead of crashing the worker,
which is consistent with how the existing ValueError guard already
treats downstream decode corruption.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2148798/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list