[Bug 1983716] Re: Improve performances of glance when using rbd backend

Cedric Lemarchand 1983716 at bugs.launchpad.net
Fri Sep 9 08:09:43 UTC 2022


Sorry for the delay, was in vacation. Below informations regarding SRU
policy.

[Impact]

This affect image upload performances, specifically for the instance
snapshot use case where the upload step to glance is taking very long
time with large ephemeral volume (like dozen or hundred of GB).

Backporting the fix will improve performance of image upload to glance
and thus reduce the whole snapshot duration.

When the image size is not known, which is true when new images are
uploaded to glance, and when the glance backend is Ceph, the rbd volume
need to be grown step by step during the upload. This fix increase the
size of those steps in order to reduce resize calls on the Ceph backend.

[Test Plan]


On a functionnal Openstack Ussuri cloud running on Focal:

1) Initial snapshot time measurement without the fix:
- spawn an instance with ephemeral root volume and fill ~50GB of data: dd if=/dev/urandom of=~/random bs=1M count=50k
- snapshot the instance, then look for string "seconds to snapshot" in /var/log/nova/nova-compute.log on the Nova host where the instance is running:
'''
nova-compute.log.53.gz:2022-07-11 09:54:11.298 3656801 INFO nova.compute.manager [req-0c9e71e9-c17e-4069-aa68-f7928fab9166 f9ec6328f6646c4c9310ff86ff6c45fca1ead9845dfa8a8dc6c4e461e5355a75 385521b179ea48068fbe5b8ccc3c396c - 24d8399e5ee54c8484cdbf79b8ee7394 24d8399e5ee54c8484cdbf79b8ee7394] [instance: 067acb11-34e6-4626-9c33-e7afa4294dbf] Took 866.04 seconds to snapshot the instance on the hypervisor.
'''

2) On the glance-api controller, manually patch python-glance-store 2.0.0:
- check glance version:

dpkg -l |grep glance
ii  glance                                2:20.2.0-0ubuntu1                               all          OpenStack Image Registry and Delivery Service - Daemons
ii  glance-api                            2:20.2.0-0ubuntu1                               all          OpenStack Image Registry and Delivery Service - API
ii  glance-common                         2:20.2.0-0ubuntu1                               all          OpenStack Image Registry and Delivery Service - Common
ii  python3-glance                        2:20.2.0-0ubuntu1                               all          OpenStack Image Registry and Delivery Service - Python 3 library
ii  python3-glance-store                  2.0.0-0ubuntu3                                  all          OpenStack Image Service store library - Python 3.x
ii  python3-glanceclient                  1:3.1.1-0ubuntu1                                all          Client library for Openstack glance server - Python 3.x


- git clone https://opendev.org/openstack/glance_store.git -b stable/ussuri /usr/lib/python3/dist-packages/glance_store_trunk/
- cd /usr/lib/python3/dist-packages/glance_store_trunk/ && git checkout tags/2.0.0 && git cherry-pick ca0c58b
- systemctl stop glance-api.service
- mv /usr/lib/python3/dist-packages/glance_store /usr/lib/python3/dist-packages/glance_store_orig && ln -s /usr/lib/python3/dist-packages/glance_store_trunk/glance_store /usr/lib/python3/dist-packages/glance_store
- systemctl start glance-api.service

3) Redo step 1)

Time taken to complete the whole snapshot whould be between 15 and ~30%
better. Ensure there are no bottleneck on the data path from the
hypervisors drive to the Ceph cluster.

[Other Info]

 * Anything else you think is useful to include
 * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board
 * and address these questions in advance


As Ceph cluster (and more specifically the RADOS sub layer of RBD) only accounts written bytes, raise resize size to 8GB is not an issue as image size is not accounted. If the cluster is close to full, the error will happens during upload, not on the resize.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to python-glance-store in Ubuntu.
https://bugs.launchpad.net/bugs/1983716

Title:
  Improve performances of glance when using rbd backend

Status in python-glance-store package in Ubuntu:
  Invalid
Status in python-glance-store source package in Focal:
  Incomplete

Bug description:
  Hello,

  In order to significantly improve performances of images upload on rbd
  store, it would be great if commit [1] can be backported from branch
  2.0.1 to focal package (actually 2.0.0-0ubuntu3).

  Except for image upload, the real use case here is to speedup
  instances snapshots, benchmarks between 2.0.0 and 2.0.1 reports a
  performance gain of ~30%: it drops from 230 to 165 seconds with an
  image of 10GB (metrics shows up in nova-compute.log on the host where
  the snapshot occurs).

  
  [1] commit ca0c58b52756058b6d51bf6a47aeac3d525c1e16 (HEAD -> stable/ussuri, tag: ussuri-em, tag: 2.0.1, origin/stable/ussuri)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python-glance-store/+bug/1983716/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list