[Bug 1227327] [NEW] ceph osd repair fails with assert(missing.num_missing() == 0)

James Troup james.troup at canonical.com
Wed Sep 18 19:20:57 UTC 2013


Public bug reported:

After an unfortunate incident with dhcpd going away, we lost 3/6 of
our ceph cluster and had to remotely power cycle them to get them
back.  Now that everything is back up, the ceph cluster has mostly
recovered but we had a couple of pg's stuck in an inconsistent state,
so I ran 'ceph osd repair' on one of the osds involved in the
inconsistent pgs.  It ran for a while and fixed some things, and then
exploded with this:

2013-09-18 18:52:24.116439 7fdf4e2d9700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::recover_got(hobject_t, eversion_t)' thread 7fdf4e2d9700 time 2013-09-18 18:52:24.035055
osd/ReplicatedPG.cc: 5351: FAILED assert(missing.num_missing() == 0)

 ceph version 0.48.3argonaut (commit:920f82e805efec2cae05b79c155c07df0f3ed5dd)
 1: (ReplicatedPG::recover_got(hobject_t, eversion_t)+0x4d4) [0x7fdf60c29794]
 2: (ReplicatedPG::submit_push_complete(ObjectRecoveryInfo&, ObjectStore::Transaction*)+0x490) [0x7fdf60c2c950]
 3: (ReplicatedPG::handle_pull_response(std::tr1::shared_ptr<OpRequest>)+0x4c6) [0x7fdf60c4ac26]
 4: (ReplicatedPG::sub_op_push(std::tr1::shared_ptr<OpRequest>)+0x96) [0x7fdf60c4ba66]
 5: (ReplicatedPG::do_sub_op(std::tr1::shared_ptr<OpRequest>)+0x3f7) [0x7fdf60c4bf17]
 6: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0xa7) [0x7fdf60d03a07]
 7: (OSD::dequeue_op(PG*)+0x23a) [0x7fdf60cc156a]
 8: (ThreadPool::worker()+0x4c4) [0x7fdf60e86dd4]
 9: (ThreadPool::WorkThread::entry()+0xd) [0x7fdf60cdab2d]
 10: (()+0x7e9a) [0x7fdf604aee9a]
 11: (clone()+0x6d) [0x7fdf5e9baccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Along with 10K more lines of spew about what it was doing.  This is
ceph 0.48.3-0ubuntu1~cloud0 from the Folsom pocket of the Ubuntu Cloud
Archive and the machine is running Ubuntu 12.04 LTS.

** Affects: ceph (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1227327

Title:
  ceph osd repair fails with assert(missing.num_missing() == 0)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1227327/+subscriptions



More information about the Ubuntu-server-bugs mailing list