[Bug 2148798] Re: gnocchi-metricd enters permanent crash loop on LZ4BlockError from truncated Carbonara object

Seyeong Kim 2148798 at bugs.launchpad.net
Tue Apr 28 04:42:51 UTC 2026


** Also affects: cloud-archive
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2148798

Title:
  gnocchi-metricd enters permanent crash loop on LZ4BlockError from
  truncated Carbonara object

Status in Ubuntu Cloud Archive:
  New
Status in gnocchi package in Ubuntu:
  New
Status in gnocchi source package in Jammy:
  New
Status in gnocchi source package in Noble:
  New
Status in gnocchi source package in Questing:
  New
Status in gnocchi source package in Resolute:
  New

Bug description:
  [Impact]

  gnocchi-metricd workers enter a permanent crash loop when a Carbonara object in the Ceph RADOS storage pool is truncated (e.g. when metricd is OOM-killed or restarted mid-write). On the next cycle unserialize() calls lz4.block.decompress() on the short buffer.
  This raises lz4.block.LZ4BlockError, which is a direct subclass of Exception (not ValueError), so the existing "except ValueError" guard in carbonara.py does not catch it and the worker crashes.
  metricd retries the same object every cycle, so a single truncated object blocks the entire telemetry pipeline indefinitely.

  Also uploaded PR for github.
  https://github.com/gnocchixyz/gnocchi/issues/1348

  [Test case]

  On a Jammy,Noble,Questing container with python3-gnocchi and
  python3-lz4 installed:

      import lz4.block, os
      from gnocchi import carbonara
      data = b"c" + lz4.block.compress(os.urandom(16384))
      truncated = data[:20]
      try:
          carbonara.AggregatedTimeSerie.unserialize(
              truncated, key=None, aggregation=None)
      except lz4.block.LZ4BlockError as e:
          print("BUG:", e)
      except carbonara.InvalidData as e:
          print("FIXED:", e)

  Before patch

  prints "BUG: Decompression failed: ..." with a traceback into
  carbonara.py AggregatedTimeSerie.unserialize().

  After patch

  prints "FIXED: Unable to unpack, invalid data".

  Repeat the same test against carbonara.BoundTimeSerie.unserialize() (data passed directly, no marker byte).
  It must also transition from BUG to FIXED.

  [Where problems could occur]

  The fix wraps lz4.block.decompress() in both unserialize() methods with "except lz4.block.LZ4BlockError: raise InvalidData".
  InvalidData is already the sentinel used by the surrounding numpy.frombuffer ValueError branch, so no caller changes are needed.
  metricd already skips metrics that raise InvalidData.

  The only new behavior is that a non-truncation corruption (bitflip,
  disk error) is now silently skipped instead of crashing the worker,
  which is consistent with how the existing ValueError guard already
  treats downstream decode corruption.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2148798/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list