[SRU][E][aws][PULL v2] Xen / hibernation: xen-netfront panic + resume hangs
Andrea Righi
andrea.righi at canonical.com
Fri Jun 5 16:20:28 UTC 2020
BugLink: https://bugs.launchpad.net/bugs/1881869
[Impact]
During our AWS testing we were able to trigger some hibernation failures
in some Xen instance types.
One problem is a kernel panic in the resume callback of the xen-netfront
driver. A workaround to this problem is to compile the driver as a
module and reload it at resume (we were already doing this reload with
the bionic kernel that had this driver compiled as a module, but for
some reasons eoan and focal had this statically compiled).
Other issues were showing up as hangs on resume, these seem to be
prevented by using the new Xen/hibernation patch set posted by Anchal to
the LKML:
https://lore.kernel.org/lkml/cover.1589926004.git.anchalag@amazon.com/
This new patch set is still being reviewed, but according to our tests
it really seems to fix some of these hangs on resume.
In addition to that we can improve hibernation reliability and
performance even more by applying the updated swapoff optimization patch
(that has been merged upstream).
[Test case]
Create a Xen instance in AWS, hibernate/resume multiple times.
[Fix]
The following set of fixes can be used to improve hibernation
performance and reliability:
- new Xen/hibernation patch set from the LKML (see link above)
- config change to compile xen-netfront as a module
- new swapoff optimization patch
[Regression potential]
The xen-netfront config change and the new swapoff optimization patch
are pretty safe (one is a config change that affects only the
xen-netfront driver, the other is a clean cherry-pick of an upstream
commit).
The new Xen/hibernation update is pretty big and the new patches are
still under review, however according to our tests it really seems to
fix some of the hang issues (it definitely makes things better).
Moreover, all the changes are affecting Xen and they are restricted to
the hibernation/resume code paths, so, in conclusion, the overall
regression potential is minimal.
[See also]
NOTE: the fix mentioned in LP: #1879711 (disable CONFIG_DMA_CMA) was
also applied during our tests and it is also required to make
hibernation stable in Xen.
Changes in v2:
- fixed git repository URL
- fixed glitched pull request
----------------------------------------------------------------
The following changes since commit d59828b58949abaac0cd4c769d547d841d48b33e:
UBUNTU: Ubuntu-aws-5.3.0-1020.22 (2020-05-27 17:15:03 -0500)
are available in the Git repository at:
git://git.launchpad.net/~arighi/+git/eoan-linux aws-arighi
for you to fetch changes up to 3c99292a75a9b318b5aba5b0da3b453741e0925c:
UBUNTU SAUCE [aws]: mm: swap: increase default swap readahead size (2020-06-05 18:17:13 +0200)
----------------------------------------------------------------
Anchal Agarwal (4):
UBUNTU: SAUCE: x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume
UBUNTU: SAUCE: genirq: Shutdown irq chips in suspend/resume during hibernation
UBUNTU: SAUCE: xen: Introduce wrapper for save/restore sched clock offset
UBUNTU: SAUCE: xen: Update sched clock offset to avoid system instability in hibernation
Andrea Righi (18):
Revert "UBUNTU SAUCE [aws]: xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs."
Revert "UBUNTU SAUCE [aws]: xen: restore pirqs on resume from hibernation."
Revert "UBUNTU SAUCE [aws]: ACPICA: Enable sleep button on ACPI legacy wake"
Revert "UBUNTU SAUCE [aws]: mm: swap: improve swap readahead heuristic"
Revert "UBUNTU SAUCE [aws] PM / hibernate: reduce memory pressure during image writing"
Revert "UBUNTU: SAUCE [aws] x86/xen: close event channels for PIRQs in system core suspend callback"
Revert "UBUNTU: SAUCE [aws] xen/events: add xen_shutdown_pirqs helper function"
Revert "UBUNTU: SAUCE [aws] x86/xen: save and restore steal clock"
Revert "UBUNTU: SAUCE [aws] xen-time-introduce-xen_-save-restore-_steal_clock"
Revert "UBUNTU: SAUCE [aws] xen-netfront: add callbacks for PM suspend and hibernation support"
Revert "UBUNTU: SAUCE [aws] x86/xen: add system core suspend and resume callbacks"
Revert "UBUNTU: SAUCE [aws] x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume"
Revert "UBUNTU: SAUCE: xenbus: add freeze/thaw/restore callbacks support"
Revert "UBUNTU: SAUCE: xen/manage: introduce helper function to know the on-going suspend mode"
Revert "UBUNTU: SAUCE: xen/manage: keep track of the on-going suspend mode"
UBUNTU: [Config] aws: compile xen-netfront as module
mm: swap: properly update readahead statistics in unuse_pte_range()
UBUNTU SAUCE [aws]: mm: swap: increase default swap readahead size
Juergen Gross (1):
xen/blkfront: fix ring info addressing
Munehisa Kamata (7):
UBUNTU: SAUCE: xen/manage: keep track of the on-going suspend mode
UBUNTU: SAUCE: xenbus: add freeze/thaw/restore callbacks support
UBUNTU: SAUCE: x86/xen: add system core suspend and resume callbacks
UBUNTU: SAUCE: xen-blkfront: add callbacks for PM suspend and hibernation
UBUNTU: SAUCE: xen-netfront: add callbacks for PM suspend and hibernation
UBUNTU: SAUCE: xen/time: introduce xen_{save,restore}_steal_clock
UBUNTU: SAUCE: x86/xen: save and restore steal clock
arch/x86/xen/suspend.c | 12 +-
arch/x86/xen/time.c | 15 ++-
arch/x86/xen/xen-ops.h | 2 +
debian.aws/config/annotations | 2 +-
debian.master/config/config.common.ubuntu | 2 +-
drivers/acpi/acpica/hwsleep.c | 11 --
drivers/block/xen-blkfront.c | 197 +++++++++++++++++++++++-------
drivers/net/xen-netfront.c | 21 ++--
drivers/xen/events/events_base.c | 58 +--------
drivers/xen/manage.c | 8 +-
drivers/xen/time.c | 7 +-
drivers/xen/xenbus/xenbus_probe.c | 47 +++----
include/linux/irq.h | 2 +
include/xen/events.h | 2 -
kernel/irq/chip.c | 2 +-
kernel/irq/internals.h | 1 +
kernel/irq/pm.c | 31 +++--
kernel/power/swap.c | 24 +++-
mm/swap_state.c | 60 +++++++--
mm/swapfile.c | 12 +-
20 files changed, 327 insertions(+), 189 deletions(-)
More information about the kernel-team
mailing list