[Bug 1931004] Re: Add support for Pacific to RBD driver

OpenStack Infra 1931004 at bugs.launchpad.net
Thu Sep 23 22:10:31 UTC 2021


Reviewed:  https://review.opendev.org/c/openstack/cinder/+/809181
Committed: https://opendev.org/openstack/cinder/commit/06b32da4be8b69e626eb7eb8091f695cbdcd92e7
Submitter: "Zuul (22348)"
Branch:    stable/ussuri

commit 06b32da4be8b69e626eb7eb8091f695cbdcd92e7
Author: Jon Bernard <jobernar at redhat.com>
Date:   Wed Apr 14 11:14:13 2021 -0400

    RBD: use correct stripe unit in clone operation
    
    The recent release of Ceph Pacific saw a change to the clone() logic
    where invalid values of stripe unit would cause an error to be returned
    where previous versions would correct the value at runtime.  This
    becomes a problem when creating a volume from an image, where the source
    RBD image may have a larger stripe unit than cinder's RBD driver is
    configured for.  When this happens, clone() is called with a stripe unit
    that is too small given that of the source image and the clone fails.
    
    The RBD driver in Cinder has a configuration parameter
    'rbd_store_chunk_size' that stores the preferred object size for cloned
    images.  If clone() is called without a stripe_unit passed in, the
    stripe unit defaults to the object size, which is 4MB by default.  The
    issue arises when creating a volume from a Glance image, where Glance is
    creating images with a default stripe unit of 8MB (distinctly larger
    than that of Cinder).  If we do not consider the incoming stripe unit
    and select the larger of the two, Ceph cannot clone an RBD image with a
    smaller stripe unit and raises an error.
    
    This patch adds a function in our driver's clone logic to select the
    larger of the two stripe unit values so that the appropriate stripe unit
    is chosen.
    
    It should also be noted that we're determining the correct stripe unit,
    but using the 'order' argument to clone().  Ceph will set the stripe
    unit equal to the object size (order) by default and we rely on this
    behaviour for the following reason: passing stripe-unit alone or with
    an order argument causes an invalid argument exception to be raised in
    pre-pacific releases of Ceph, as it's argument parsing appears to have
    limitations.
    
    Closes-Bug: #1931004
    Change-Id: Iec111ab83e9ed8182c9679c911e3d90927d5a7c3
    (cherry picked from commit 49a2c85eda9fd3cddc75fd904fe62c87a6b50735)
    (cherry picked from commit 5db58159feec3d2d39d1abf3637310f5ac60a3cf)
    Conflicts:
            cinder/tests/unit/volume/drivers/test_rbd.py
    (cherry picked from commit 07ead73eec0ac6b962b533b07861d6a81226fa37)


** Tags added: in-stable-ussuri

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to glance in Ubuntu.
https://bugs.launchpad.net/bugs/1931004

Title:
  Add support for Pacific to RBD driver

Status in Cinder:
  Fix Released
Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive wallaby series:
  New
Status in Ubuntu Cloud Archive xena series:
  New
Status in glance package in Ubuntu:
  Confirmed
Status in glance source package in Hirsute:
  Confirmed
Status in glance source package in Impish:
  Confirmed

Bug description:
  When using ceph pacific, volume-from-image operations where both
  glance and cinder are configured to use RBD result in an exception
  when calling clone():

      rbd.InvalidArgument: [errno 22] RBD invalid argument (error
  creating clone)

      ERROR cinder.volume.manager Traceback (most recent call last):
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
      ERROR cinder.volume.manager     result = task.execute(**arguments)
      ERROR cinder.volume.manager   File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 1132, in execute
      ERROR cinder.volume.manager     model_update = self._create_from_image(context,
      ERROR cinder.volume.manager   File "/opt/stack/cinder/cinder/utils.py", line 638, in _wrapper
      ERROR cinder.volume.manager     return r.call(f, *args, **kwargs)
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 411, in call
      ERROR cinder.volume.manager     return self.__call__(*args, **kwargs)
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 423, in __call__
      ERROR cinder.volume.manager     do = self.iter(retry_state=retry_state)
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 360, in iter
      ERROR cinder.volume.manager     return fut.result()
      ERROR cinder.volume.manager   File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 438, in result
      ERROR cinder.volume.manager     return self.__get_result()
      ERROR cinder.volume.manager   File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 390, in __get_result
      ERROR cinder.volume.manager     raise self._exception
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 426, in __call__
      ERROR cinder.volume.manager     result = fn(*args, **kwargs)
      ERROR cinder.volume.manager   File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 998, in _create_from_image
      ERROR cinder.volume.manager     model_update, cloned = self.driver.clone_image(context,
      ERROR cinder.volume.manager   File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 1571, in clone_image
      ERROR cinder.volume.manager     volume_update = self._clone(volume, pool, image, snapshot)
      ERROR cinder.volume.manager   File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 1023, in _clone
      ERROR cinder.volume.manager     self.RBDProxy().clone(src_client.ioctx,
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/eventlet/tpool.py", line 190, in doit
      ERROR cinder.volume.manager     result = proxy_call(self._autowrap, f, *args, **kwargs)
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/eventlet/tpool.py", line 148, in proxy_call
      ERROR cinder.volume.manager     rv = execute(f, *args, **kwargs)
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/eventlet/tpool.py", line 129, in execute
      ERROR cinder.volume.manager     six.reraise(c, e, tb)
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/six.py", line 719, in reraise
      ERROR cinder.volume.manager     raise value
      ERROR cinder.volume.manager   File "/usr/local/lib/python3.9/site-packages/eventlet/tpool.py", line 83, in tworker
      ERROR cinder.volume.manager     rv = meth(*args, **kwargs)
      ERROR cinder.volume.manager   File "rbd.pyx", line 698, in rbd.RBD.clone
      ERROR cinder.volume.manager rbd.InvalidArgument: [errno 22] RBD invalid argument (error creating clone)
      ERROR cinder.volume.manager

  In Pacific a check was added to make sure during a clone operation
  that the child's strip unit was not less than that of its parent.
  Failing this condition returns -EINVAL, which is then raised by
  python-rbd as an exception.  This maps to the 'order' argument in
  clone(), where order is log base 2 of the strip unit.  Ceph's default
  is 4 megabytes.  The reason we're seeing EINVAL exceptions in the
  Pacific CI is that: when Openstack is configured to use Ceph for both
  cinder and glance, volume-from-image tests fail because Glance's
  default stripe unit is 8 (distinctly larger than Cinder's 4). This
  results in an order calculation of 22, which is invalid for clone()
  (too small).

  I see two possible solutions and have proposed patches:

  1. Increase Cinder's default chunk size to match Glance's.  I think
  this makes sense for both consistency and performance.

  2. When doing a clone(), consider the configured chunk size /and/ the
  strip unit of the parent volume and choose the higher value.

  Either of these approaches prevent the failures we're seeing, I think
  they are both useful individually as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1931004/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list