[Bug 1994002] Re: [SRU] migration was active, but no RAM info was set

Mauricio Faria de Oliveira 1994002 at bugs.launchpad.net
Sat Oct 28 18:18:43 UTC 2023


Verification done on ussuri-proposed.
Steps explained in previous comments.

The migration status with the synthetic reproducer in GDB
is now still 'SETUP' (which is not expected to have RAM statistics),
instead of 'ACTIVE' (which is, and caused the issue):

(qemu) info migrate
...
Migration status: setup

...

$ lsb_release -cs
bionic

$ apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:4.2-3ubuntu6.27~cloud0
  Candidate: 1:4.2-3ubuntu6.27~cloud0
  Version table:
 *** 1:4.2-3ubuntu6.27~cloud0 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-proposed/ussuri/main amd64 Packages
        100 /var/lib/dpkg/status
...

$ file $(which qemu-system-x86_64) | grep -o 'BuildID.*,'
BuildID[sha1]=82a4159294ae653e770be24bbcfbb35703e60005,

(Corey provided the .ddeb packages which is not yet exposed/published in
PPAs/archive.)

$ dpkg-deb -c qemu-system-x86-dbgsym_4.2-3ubuntu6.27~cloud0_amd64.ddeb | fgrep .debug
-rw-r--r-- root/root  21271712 2023-10-26 14:08 ./usr/lib/debug/.build-id/48/bd78ceee4a669d37efd9ac8d851947205de4f7.debug
-rw-r--r-- root/root  21321832 2023-10-26 14:08 ./usr/lib/debug/.build-id/82/a4159294ae653e770be24bbcfbb35703e60005.debug

$ sudo apt install ./qemu-
system-x86-dbgsym_4.2-3ubuntu6.27~cloud0_amd64.ddeb

$ apt source qemu

$ head -n1 qemu-4.2/debian/changelog
qemu (1:4.2-3ubuntu6.27~cloud0) bionic-ussuri; urgency=medium

 915 static void fill_source_migration_info(MigrationInfo *info)
...
 926     case MIGRATION_STATUS_SETUP:
 927         info->has_status = true;
 928         info->has_total_time = false;
 929         break;

Terminal 1)

$ qemu-system-x86_64 -nodefaults -nographic -S -incoming tcp:0:4444

Terminal 2)

$ gdb \
  -ex 'set non-stop on' -ex 'set pagination off' -ex 'set confirm off' \
  qemu-system-x86_64
...
Reading symbols from qemu-system-x86_64...Reading symbols from /usr/lib/debug/.build-id/82/a4159294ae653e770be24bbcfbb35703e60005.debug...done.
done.

(gdb) b migrate_set_state
Breakpoint 1 at 0x6ba8c0: file ./migration/migration.c, line 1464.

(gdb) b migration/migration.c:928
Breakpoint 2 at 0x6b9fb3: file ./migration/migration.c, line 928.

(gdb) run -nodefaults -nographic -S -monitor tcp:0:3333,server,wait=off
...

Terminal 3)

$ nc 127.0.0.1 3333
QEMU 4.2.1 monitor - type 'help' for more information
(qemu) migrate -d tcp:127.0.0.1:4444

Terminal 2)

Thread 1 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566a11d8, old_state=0, new_state=1) at ./migration/migration.c:1464
1464    ./migration/migration.c: No such file or directory.

(gdb) p (MigrationStatus) 0
$1 = MIGRATION_STATUS_NONE

(gdb) p (MigrationStatus) 1
$2 = MIGRATION_STATUS_SETUP

(gdb) c
Continuing.
...

Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566a11d8, old_state=1, new_state=4) at ./migration/migration.c:1464
1464    in ./migration/migration.c

(gdb) p (MigrationStatus) 1
$3 = MIGRATION_STATUS_SETUP

(gdb) p (MigrationStatus) 4
$4 = MIGRATION_STATUS_ACTIVE

(gdb)

Terminal 3)

(qemu) info migrate

Terminal 2)

Thread 1 "qemu-system-x86" hit Breakpoint 2, fill_source_migration_info (info=0x5555572d29b0) at ./migration/migration.c:928
928     in ./migration/migration.c

(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7ffff7fcdcc0 (LWP 1477) "qemu-system-x86" fill_source_migration_info (info=0x5555572d29b0) at ./migration/migration.c:928
  2    Thread 0x7fffe61ff700 (LWP 1481) "qemu-system-x86" (running)
  3    Thread 0x7fffe59fe700 (LWP 1482) "qemu-system-x86" (running)
  5    Thread 0x7fffdd7fe700 (LWP 1485) "qemu-system-x86" migrate_set_state (state=0x5555566a11d8, old_state=1, new_state=4) at ./migration/migration.c:1464
  
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fffdd7fe700 (LWP 1485))]
#0  migrate_set_state (state=0x5555566a11d8, old_state=1, new_state=4) at ./migration/migration.c:1464
1464    in ./migration/migration.c

(gdb) continue &
Continuing.

(gdb) info threads
  Id   Target Id         Frame
  1    Thread 0x7ffff7fcdcc0 (LWP 1477) "qemu-system-x86" fill_source_migration_info (info=0x5555572d29b0) at ./migration/migration.c:928
  2    Thread 0x7fffe61ff700 (LWP 1481) "qemu-system-x86" (running)
  3    Thread 0x7fffe59fe700 (LWP 1482) "qemu-system-x86" (running)
* 5    Thread 0x7fffdd7fe700 (LWP 1485) "qemu-system-x86" (running)

(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7fcdcc0 (LWP 1477))]
#0  fill_source_migration_info (info=0x5555572d29b0) at ./migration/migration.c:928
928     in ./migration/migration.c

(gdb) c
Continuing.

Terminal 3)

(qemu) info migrate
info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: setup
total time: 0 milliseconds

Status is now still 'SETUP' (which is not expected to have RAM
statistics), not 'ACTIVE' (which is, and caused the issue).

** Tags removed: verification-ussuri-needed
** Tags added: verification-ussuri-done

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1994002

Title:
  [SRU] migration was active, but no RAM info was set

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive ussuri series:
  Fix Committed
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Bionic:
  Fix Released
Status in qemu source package in Focal:
  Fix Released
Status in qemu source package in Jammy:
  Fix Released
Status in qemu source package in Kinetic:
  Fix Released

Bug description:
  [Impact]

   * While live-migrating many instances concurrently, libvirt sometimes
  return `internal error: migration was active, but no RAM info was
  set:`

   * Effects of this bug are mostly observed in large scale clusters
  with a lot of live migration activity.

   * Has second order effects for consumers of migration monitor such as
  libvirt and openstack.

  [Test Case]

  Synthetic reproducer with GDB in comment #21.

  Steps to Reproduce:
  1. live evacuate a compute
  2. live migration of one or more instances fails with the above error

  N.B Due to the nature of this bug it is difficult consistently reproduce.
  In an environment where it has been observed it is estimated to occur approximately 1/1000 migrations.

  [Where problems could occur]
   * In the event of a regression the migration monitor may report an inconsistent state.

  [Original Bug Description]

  While live-migrating many instances concurrently, libvirt sometimes return internal error: migration was active, but no RAM info was set:
  ~~~
  2022-03-30 06:08:37.197 7 WARNING nova.virt.libvirt.driver [req-5c3296cf-88ee-4af6-ae6a-ddba99935e23 - - - - -] [instance: af339c99-1182-4489-b15c-21e52f50f724] Error monitoring migration: internal error: migration was active, but no RAM info was set: libvirt.libvirtError: internal error: migration was active, but no RAM info was set
  ~~~

  From upstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=2074205

  [Other Information]
  Related bug: https://bugs.launchpad.net/nova/+bug/1982284

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1994002/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list