[Bug 2100567] Re: [SRU] rgw: fix lock scope in ObjectCache::get()

Dan Hill 2100567 at bugs.launchpad.net
Thu Feb 27 23:47:02 UTC 2025


** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/ussuri
   Importance: Undecided
       Status: New

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2100567

Title:
  [SRU] rgw: fix lock scope in ObjectCache::get()

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive ussuri series:
  New
Status in ceph package in Ubuntu:
  New
Status in ceph source package in Focal:
  New

Bug description:
  [ Impact ]

  Improper locking in ObjectCache::get() allows for a race condition
  when accessing the cache map. This can cause the RGW service to crash,
  see: issue#51927 [0].

  ```
      "backtrace": [
          "(()+0x46210) [0x7f7c25b8c210]",
          "(ceph::buffer::v15_2_0::ptr::ptr(ceph::buffer::v15_2_0::ptr const&)+0x17) [0x7f7c25888bb7]",
          "(ceph::buffer::v15_2_0::ptr_node::cloner::operator()(ceph::buffer::v15_2_0::ptr_node const&)+0x2d) [0x7f7c2588bf0d]",
          "(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >* std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_M_copy<std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_Reuse_or_alloc_node>(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > const*, std::_Rb_tree_node_base*, std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_Reuse_or_alloc_node&)+0x4c3) [0x7f7c2623b8c3]",
          "(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >* std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_M_copy<std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_Reuse_or_alloc_node>(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > const*, std::_Rb_tree_node_base*, std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_Reuse_or_alloc_node&)+0x182) [0x7f7c2623b582]",
          "(std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::operator=(std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > const&)+0x97) [0x7f7c2623bb57]",
          "(ObjectCache::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ObjectCacheInfo&, unsigned int, rgw_cache_entry_info*)+0x4fe) [0x7f7c2637085e]",
          "(RGWSI_SysObj_Cache::raw_stat(rgw_raw_obj const&, unsigned long*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, unsigned long*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*, ceph::buffer::v15_2_0::list*, RGWObjVersionTracker*, optional_yield)+0x23c) [0x7f7c267c9d1c]",
          "(RGWSI_SysObj_Core::get_system_obj_state_impl(RGWSysObjectCtxBase*, rgw_raw_obj const&, RGWSysObjState**, RGWObjVersionTracker*, optional_yield)+0x7f8) [0x7f7c262d9078]",
          "(RGWSI_SysObj_Core::get_system_obj_state(RGWSysObjectCtxBase*, rgw_raw_obj const&, RGWSysObjState**, RGWObjVersionTracker*, optional_yield)+0x4a) [0x7f7c262d9ada]",
          "(RGWSI_SysObj_Core::stat(RGWSysObjectCtxBase&, RGWSI_SysObj_Obj_GetObjState&, rgw_raw_obj const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*, bool, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, unsigned long*, RGWObjVersionTracker*, optional_yield)+0x6a) [0x7f7c262d9b6a]",
          "(RGWSI_SysObj::Obj::ROp::stat(optional_yield)+0x66) [0x7f7c262d0506]",
          "(rgw_get_system_obj(RGWSysObjectCtx&, rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list&, RGWObjVersionTracker*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, optional_yield, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*, rgw_cache_entry_info*, boost::optional<obj_version>)+0x1ef) [0x7f7c266e16ef]",
          "(RGWSI_MetaBackend_SObj::get_entry(RGWSI_MetaBackend::Context*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWSI_MetaBackend::GetParams&, RGWObjVersionTracker*, optional_yield)+0xfb) [0x7f7c267bab1b]",
          "(RGWSI_Bucket_SObj::do_read_bucket_instance_info(ptr_wrapper<RGWSI_MetaBackend::Context, 4>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWBucketInfo*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*, rgw_cache_entry_info*, boost::optional<obj_version>, optional_yield)+0x1a8) [0x7f7c2679d718]",
          "(RGWSI_Bucket_SObj::read_bucket_instance_info(ptr_wrapper<RGWSI_MetaBackend::Context, 4>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWBucketInfo*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*, optional_yield, rgw_cache_entry_info*, boost::optional<obj_version>)+0x637) [0x7f7c2679df17]",
          "(()+0x5f2035) [0x7f7c2632a035]",
          "(std::_Function_handler<int (RGWSI_MetaBackend_Handler::Op*), RGWBucketInstanceMetadataHandler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (ptr_wrapper<RGWSI_MetaBackend::Context, 4>&)>)::{lambda(RGWSI_MetaBackend_Handler::Op*)#1}>::_M_invoke(std::_Any_data const&, RGWSI_MetaBackend_Handler::Op*&&)+0x36) [0x7f7c26349466]",
          "(()+0xa807de) [0x7f7c267b87de]",
          "(RGWSI_MetaBackend_SObj::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend::Context*)>)+0x9e) [0x7f7c267bb3fe]",
          "(RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend_Handler::Op*)>)+0x5f) [0x7f7c267b85bf]",
          "(RGWBucketCtl::read_bucket_info(rgw_bucket const&, RGWBucketInfo*, optional_yield, RGWBucketCtl::BucketInstance::GetParams const&, RGWObjVersionTracker*)+0x1fb) [0x7f7c2632888b]",
          "(rgw_build_bucket_policies(rgw::sal::RGWRadosStore*, req_state*)+0xcc0) [0x7f7c265ac0d0]",
          "(RGWHandler::do_init_permissions()+0x32) [0x7f7c265ad832]",
          "(RGWHandler_REST::init_permissions(RGWOp*)+0x178) [0x7f7c2665ff98]",
          "(rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool)+0x248) [0x7f7c261e1e28]",
          "(process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, int*)+0x2d34) [0x7f7c261e78a4]",
          "(()+0x3dfb53) [0x7f7c26117b53]",
          "(()+0x3e1051) [0x7f7c26119051]",
          "(()+0x3e11ac) [0x7f7c261191ac]",
          "(make_fcontext()+0x2f) [0x7f7c268c439f]" 
      ],
  ```

  The fix ensures that the unique_lock for writes remains held until the
  function returns, preventing unsafe modifications.

  This fix was not backported to Octopus because the release reached EOL
  upstream and no further updates were accepted [1]. This fix was first
  applied in v16.2.8 and has been stable in production deployments since
  2022-06-21, with no reported regressions or issues in the field.

  [ Test Plan ]

  This issue is a race condition, making it rare and difficult to
  reproduce reliably.

  It is more likely to manifest under high concurrency, when multiple
  clients are accessing and modifying the object cache simultaneously.

  To increase the likelihood of triggering this issue, concurrent fio S3
  benchmarking tests will be run [3].

  [ Where problems could occur ]

  This change modifies the locking mechanism within ObjectCache::get(),
  with potential risks including deadlocks, performance regressions due
  to increased contention, or unintended behavior from incorrect lock
  promotion.

  Regression symptoms could include increased request latency,
  unexpected thread contention, or crashes if deadlocks occur. Testing
  will focus on performance benchmarks and stress testing under high
  concurrency.

  [ Other Info ]

  The fix follows C++ best practices for locking (RAII with
  std::defer_lock) and is consistent with locking patterns used
  elsewhere in Ceph.

  [0] https://tracker.ceph.com/issues/52800
  [1] https://tracker.ceph.com/issues/53071
  [2] https://github.com/axboe/fio/blob/master/examples/http-s3.fio

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2100567/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list