[SRU][N:gke][PATCH 0/5] NVIDIA Grace update for Google (Sep 2025)

Tim Whisonant tim.whisonant at canonical.com
Fri Oct 3 22:21:31 UTC 2025


BugLink: https://bugs.launchpad.net/bugs/2126453

SRU Justification:

[Impact]

cpufreq/cppc: Don't compare desired_perf in target()

There is a corner case where the desired_perf is exactly same as the old
perf, but the actual current freq is not.

This happens during S3 while the cpufreq governor is set to powersave.
During cpufreq resume process, the booting CPU's new_freq obtained via
.get() is the highest frequency, while the policy->cur and
cpu->perf_ctrls.desired_perf are set to the lowest level (powersave
governor). This causes the warning: "CPU frequency out of sync:", and
the cpufreq core sets policy->cur to new_freq.

Then the governor->limits() calls cppc_cpufreq_set_target() to
configures the CPU frequency and returns directly because the
desired_perf converted from target_freq is same as the
cpu->perf_ctrls.desired_perf and both are the lowest_perf.

Since target_freq and policy->cur have been already compared in
__cpufreq_driver_target(), there's no need to compare them again here.

Drop the comparison.


i2c: tegra: check msg length in SMBUS block read

For SMBUS block read, do not continue to read if the message length
passed from the device is '0' or greater than the maximum allowed bytes.


iommu: Skip PASID validation for devices without PASID capability

Generally PASID support requires ACS settings that usually create
single device groups, but there are some niche cases where we can get
multi-device groups and still have working PASID support. The primary
issue is that PCI switches are not required to treat PASID tagged TLPs
specially so appropriate ACS settings are required to route all TLPs to
the host bridge if PASID is going to work properly.

pci_enable_pasid() does check that each device that will use PASID has
the proper ACS settings to achieve this routing.

However, no-PASID devices can be combined with PASID capable devices
within the same topology using non-uniform ACS settings. In this case
the no-PASID devices may not have strict route to host ACS flags and
end up being grouped with the PASID devices.

This configuration fails to allow use of the PASID within the iommu
core code which wrongly checks if the no-PASID device supports PASID.

Fix this by ignoring no-PASID devices during the PASID validation. They
will never issue a PASID TLP anyhow so they can be ignored.


mm/gup: handle NULL pages in unpin_user_pages()

The recent addition of "pofs" (pages or folios) handling to gup has a
flaw: it assumes that unpin_user_pages() handles NULL pages in the pages**
array.  That's not the case, as I discovered when I ran on a new
configuration on my test machine.

Fix this by skipping NULL pages in unpin_user_pages(), just like
unpin_folios() already does.

Details: when booting on x86 with "numa=fake=2 movablecore=4G" on Linux
6.12, and running this:

    tools/testing/selftests/mm/gup_longterm

...I get the following crash:

BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0
...
Call Trace:
 <TASK>
 ? __die_body+0x66/0xb0
 ? page_fault_oops+0x30c/0x3b0
 ? do_user_addr_fault+0x6c3/0x720
 ? irqentry_enter+0x34/0x60
 ? exc_page_fault+0x68/0x100
 ? asm_exc_page_fault+0x22/0x30
 ? sanity_check_pinned_pages+0x3a/0x2d0
 unpin_user_pages+0x24/0xe0
 check_and_migrate_movable_pages_or_folios+0x455/0x4b0
 __gup_longterm_locked+0x3bf/0x820
 ? mmap_read_lock_killable+0x12/0x50
 ? __pfx_mmap_read_lock_killable+0x10/0x10
 pin_user_pages+0x66/0xa0
 gup_test_ioctl+0x358/0xb20
 __se_sys_ioctl+0x6b/0xc0
 do_syscall_64+0x7b/0x150
 entry_SYSCALL_64_after_hwframe+0x76/0x7e


UBUNTU: [Packaging] gke: disable CONFIG_IOMMU_DEFAULT_DMA_STRICT enable CONFIG_IOMMU_DEFAULT_DMA_LAZY

NVIDIA Grace platform recommends this setting as
of the 25 Sep 2025 list of configs found at [1].

[1] https://docs.nvidia.com/grace/patches-config-guide/platform-software-patches-config.html

[Fix]

This patchset targets noble:linux-gke.

[Test Plan]

Boot tested and minimal Grace checkout.
Google to perform verification once platform is available.

[Where problems could occur]

The changes affect i2c/tegra, mm, cpufreq, and iommu.

Akhil R (1):
  i2c: tegra: check msg length in SMBUS block read

John Hubbard (1):
  mm/gup: handle NULL pages in unpin_user_pages()

Riwen Lu (1):
  cpufreq/cppc: Don't compare desired_perf in target()

Tim Whisonant (1):
  UBUNTU: [Packaging] gke: disable CONFIG_IOMMU_DEFAULT_DMA_STRICT
    enable CONFIG_IOMMU_DEFAULT_DMA_LAZY

Tushar Dave (1):
  iommu: Skip PASID validation for devices without PASID capability

 debian.gke/config/annotations  |  6 ++++++
 drivers/cpufreq/cppc_cpufreq.c |  9 ++-------
 drivers/i2c/busses/i2c-tegra.c |  5 +++++
 drivers/iommu/iommu.c          | 22 ++++++++++++++++------
 mm/gup.c                       | 11 ++++++++++-
 5 files changed, 39 insertions(+), 14 deletions(-)

-- 
2.43.0




More information about the kernel-team mailing list