[Bug 2090994] Re: Seeing OS hang after running 300+ reboot loops on Turin with MI210 GPUs
Heinrich Schuchardt
2090994 at bugs.launchpad.net
Thu Dec 5 10:18:53 UTC 2024
Brahamaprakash thanks for reporting the issue.
For the workaround suggested by AMD you can use a configuration file in
/etc/default/grub.d/ to apply your settings. The file name must end with
.cfg. Cf.
https://help.ubuntu.com/community/Grub2/Setup#Scripts:_.2Fetc.2Fgrub.d.2F
The issue seems to be in the Linux kernel and not in GRUB. Please,
provide the kernel version and append the full kernel log.
** Changed in: grub2 (Ubuntu)
Status: New => Incomplete
** Package changed: grub2 (Ubuntu) => linux (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/2090994
Title:
Seeing OS hang after running 300+ reboot loops on Turin with MI210
GPUs
Status in linux package in Ubuntu:
Incomplete
Bug description:
1. Run a rebooter script on Turin blade with AMD GPU MI210 installed
in Gdale
Result: After 300+ reboots OS (Ubuntu) hangs (see image attached)
Rebooting OS will fix an issue.
Seen on multiple blade setups. No hangs observer with less than 300
loops so far.
Here is a Linux grub workaround provided by AMD (also see image
attached):
Please disable 5-level page tables. Here is how to do it:
1. Open the GRUB configuration file: sudo nano /etc/default/grub
2. Add no5lvl to the list of parameters: GRUB_CMDLINE_LINUX="... no5lvl"
3. Add add iommu=pt to GRUB_CMDLINE_LINUX_DEFAULT
4. Update GRUB: sudo update-grub
5. Reboot the system.
Ubuntu OS version used here is 22.04
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2090994/+subscriptions
More information about the foundations-bugs
mailing list