APPLIED: [PULL REQUESTS][focal/jammy/lunar linux-azure] [Azure] Fix VM crash/hang issues due to fast VF add/remove events
Tim Gardner
tim.gardner at canonical.com
Wed Jul 5 19:14:00 UTC 2023
On 6/12/23 3:13 PM, Tim Gardner wrote:
> BugLink: https://bugs.launchpad.net/bugs/2023594
>
> SRU Justification
>
> [Impact]
>
> A Linux guest on Hyper-V/Azure can occasionally crash during early Linux
> kernel boot due to a strange host behavior:
> 1. The host assigns a VF to the guest;
> 2. The host immediately unassigns the VF from the guest; //Dexuan: due
> to some race conditions bug in Linux vPCI driver, Linux can crash.
> 3. The host assigns the VF to the guest again.
>
> Starting late 2022 (around Nov 2022), Linux guests on Azure started to
> crash more frequently due to a host side update at that time: a new
> host/hypervisor feature of handling "correctable memory errors" can
> cause a lot of successive VF remove/add events, so the race conditions
> bug in Linux vPCI driver can surface much more easily. The Hyper-V team
> is implementing a batching mechanism so that the guest will get much
> less VF remove/add events (ETA: June 2023), but meanwhile we should also
> get the Linux race condition bugs fixed so that Linux guests won't crash
> even if it receives the successive VF remove/add events.
>
> [Test Plan]
>
> Microsoft tested
>
> [Regression potential]
>
> PCI devices may not get registered, or VMs may crash.
>
> [Other Info]
>
> SF: #00349076
>
> -------------------------------------------------------------------------------
> The following changes since commit
> d250cc0ce73d5582e5eb073fa948567ec2ef67d5:
>
> UBUNTU: Ubuntu-azure-5.4.0-1110.116 (2023-06-02 12:51:11 -0600)
>
> are available in the Git repository at:
>
> git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/focal
> focal-azure-fix-vm-add-remove-race-condition
>
> for you to fetch changes up to ac03c5aa4ead40832fcd94d814d28ea8087fb906:
>
> UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock (2023-06-12
> 14:57:33 -0600)
>
> ----------------------------------------------------------------
> Dexuan Cui (4):
> UBUNTU: SAUCE: PCI: hv: Fix a race condition bug in
> hv_pci_query_relations()
> UBUNTU: SAUCE: PCI: hv: Fix a race condition in hv_irq_unmask()
> that can cause panic
> UBUNTU: SAUCE: PCI: hv: Remove the useless hv_pcichild_state from
> struct hv_pci_dev
> UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock
>
> drivers/pci/controller/pci-hyperv.c | 58
> ++++++++++++++++++++++++++++++++++++++--------------------
> 1 file changed, 38 insertions(+), 20 deletions(-)
> -------------------------------------------------------------------------------
>
> The following changes since commit
> 98439f7092e414a0534c5687742e2c8309a30204:
>
> UBUNTU: Ubuntu-azure-5.15.0-1040.47 (2023-06-01 13:13:05 -0600)
>
> are available in the Git repository at:
>
> git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/jammy
> jammy-azure-fix-vm-add-remove-race-condition
>
> for you to fetch changes up to c7caafcc393e6435c8b6b24ecde574f30be95c35:
>
> PCI: hv: Use async probing to reduce boot time (2023-06-12 14:58:58
> -0600)
>
> ----------------------------------------------------------------
> Dexuan Cui (6):
> PCI: hv: Fix a race condition bug in hv_pci_query_relations()
> PCI: hv: Fix a race condition in hv_irq_unmask() that can cause
> panic
> PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
> Revert "PCI: hv: Fix a timing issue which causes kdump to fail
> occasionally"
> PCI: hv: Add a per-bus mutex state_lock
> PCI: hv: Use async probing to reduce boot time
>
> drivers/pci/controller/pci-hyperv.c | 145
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
> 1 file changed, 86 insertions(+), 59 deletions(-)
> -------------------------------------------------------------------------------
>
> The following changes since commit
> e4972fe7acaf327c1bd496cf2286889bd913c9bf:
>
> UBUNTU: Ubuntu-azure-6.2.0-1005.5 (2023-06-01 11:42:19 -0600)
>
> are available in the Git repository at:
>
>
> git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux-azure/+git/lunar
> lunar-azure-fix-vm-add-remove-race-condition
>
> for you to fetch changes up to 7e734194962a51084cba9016f0e5a512805f7d0a:
>
> PCI: hv: Use async probing to reduce boot time (2023-06-12 15:01:00
> -0600)
>
> ----------------------------------------------------------------
> Dexuan Cui (6):
> PCI: hv: Fix a race condition bug in hv_pci_query_relations()
> PCI: hv: Fix a race condition in hv_irq_unmask() that can cause
> panic
> PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
> Revert "PCI: hv: Fix a timing issue which causes kdump to fail
> occasionally"
> PCI: hv: Add a per-bus mutex state_lock
> PCI: hv: Use async probing to reduce boot time
>
> drivers/pci/controller/pci-hyperv.c | 145
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
> 1 file changed, 86 insertions(+), 59 deletions(-)
Applied to f/j/l linux-azure:master-next. Thanks.
-rtg
--
-----------
Tim Gardner
Canonical, Inc
More information about the kernel-team
mailing list