[SRU][N:gke][PATCH 000/106] Enable NVIDIA Grace platform for Google Kubernetes Engine

Tim Whisonant tim.whisonant at canonical.com
Mon Jul 21 16:20:43 UTC 2025


BugLink: https://bugs.launchpad.net/bugs/2117098

SRU Justification

[Impact]

Google requested a Grace-enabled 6.8 kernel for GKE, supporting 64k
page size. This patchset targets noble:linux-gke, and it is based on
an earlier patchset of the same name that targeted GCP. All of the
patches included here are the same as from the GCP patchset with
the 3 exceptions listed below. Note that noble:gke had no 64k page
size flavor prior to this patchset.

[SRU][N:gke][PATCH 094/106] UBUNTU: [Packaging] gke: enable CONFIG_CPUFREQ_ARCH_CUR_FREQ
[SRU][N:gke][PATCH 095/106] UBUNTU: [Packaging] gke: enable CONFIG_ARM64_CONTPTE
[SRU][N:gke][PATCH 106/106] UBUNTU: [Packaging] gke: Add 64k page flavor

[Fix]

Add 64k page size flavor to noble:linux-gke.

Add select NVIDIA Grace platform patches, similar to what has been done
for kernels like noble:linux-azure-nvidia. NVIDIA provides a
document [1] listing all of the required and recommended kernel patches for
Grace enablement. Version 11 April 25, 2025 of the document was used
when preparing this patchset as was version
Ubuntu-azure-nvidia-6.8.0-1016.17 of the noble:linux-azure-nvidia kernel.

[1] https://docs.nvidia.com/grace-patch-config-guide.pdf

[Test Plan]

Boot testing on arm64/64k pages was performed in house. Further testing
will be done by Google in their Grace-enabled environment once available.

[Regression potential]

The regression potential is considered moderate due to the number of
patches involved and the breadth of the changes. The changes affect
pte, perf, PCI, cpufreq, ACPI, mmu, coresight, and other kernel
subsystems.

[Other]

PIT #427777553

Alexey Kardashevskiy (1):
  PCI/DOE: Support discovery version 2

Arnd Bergmann (1):
  arm64/io: add constant-argument check

Barry Song (1):
  mm: make folio_pte_batch available outside of mm/memory.c

Beata Michalska (6):
  arm64: amu: Delay allocating cpumask for AMU FIE support
  cpufreq: Allow arch_freq_get_on_cpu to return an error
  cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
  arm64: Provide an AMU-based version of arch_freq_get_on_cpu
  arm64: Update AMU-based freq scale factor on entering idle
  arm64: Utilize for_each_cpu_wrap for reference lookup

Besar Wicaksono (5):
  perf arm-spe: Add Neoverse-V2 to common data source encoding list
  perf: arm_cspmu: nvidia: remove unsupported SCF events
  perf: arm_cspmu: nvidia: fix sysfs path in the kernel doc
  perf: arm_cspmu: nvidia: enable NVLINK-C2C port filtering
  perf: arm_cspmu: nvidia: monitor all ports by default

Dan Williams (1):
  ACPI/HMAT: Move HMAT messages to pr_debug()

David Hildenbrand (14):
  arm/pgtable: define PFN_PTE_SHIFT
  nios2/pgtable: define PFN_PTE_SHIFT
  powerpc/pgtable: define PFN_PTE_SHIFT
  riscv/pgtable: define PFN_PTE_SHIFT
  s390/pgtable: define PFN_PTE_SHIFT
  sparc/pgtable: define PFN_PTE_SHIFT
  mm/pgtable: make pte_next_pfn() independent of set_ptes()
  arm/mm: use pte_next_pfn() in set_ptes()
  powerpc/mm: use pte_next_pfn() in set_ptes()
  mm/memory: factor out copying the actual PTE in copy_present_pte()
  mm/memory: pass PTE to copy_present_pte()
  mm/memory: optimize fork() with PTE-mapped THP
  mm/memory: ignore dirty/accessed/soft-dirty bits in folio_pte_batch()
  mm/memory: ignore writable bit in folio_pte_batch()

Gavin Shan (2):
  arm64: tlb: Improve __TLBI_VADDR_RANGE()
  arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES

Ian Rogers (1):
  perf arm-spe/cs-etm: Directly iterate CPU maps

Ilkka Koskinen (1):
  perf cs-etm: Fix the assert() to handle captured and unprocessed cpu
    trace

Ionela Voinescu (1):
  arch_topology: init capacity_freq_ref to 0

James Clark (30):
  coresight: Remove unused ETM Perf stubs
  coresight: Clarify comments around the PID of the sink owner
  coresight: Move struct coresight_trace_id_map to common header
  coresight: Expose map arguments in trace ID API
  coresight: Make CPU id map a property of a trace ID map
  coresight: Make language around "activated" sinks consistent
  coresight: Remove ops callback checks
  coresight: Move mode to struct coresight_device
  coresight: Remove the 'enable' field.
  coresight: Move all sysfs code to sysfs file
  coresight: Remove atomic type from refcnt
  coresight: Remove unused stubs
  coresight: Add explicit member initializers to coresight_dev_type
  coresight: Add helper for atomically taking the device
  coresight: Add a helper for getting csdev->mode
  coresight: Use per-sink trace ID maps for Perf sessions
  coresight: Remove pending trace ID release mechanism
  coresight: Emit sink ID in the HW_ID packets
  coresight: Make trace ID map spinlock local to the map
  perf auxtrace: Allow number of queues to be specified
  perf cs-etm: Print error for new PERF_RECORD_AUX_OUTPUT_HW_ID versions
  perf cs-etm: Use struct perf_cpu as much as possible
  perf cs-etm: Create decoders after both AUX and HW_ID search passes
  perf: cs-etm: Allocate queues for all CPUs
  perf: cs-etm: Move traceid_list to each queue
  perf: cs-etm: Create decoders based on the trace ID mappings
  perf: cs-etm: Only save valid trace IDs into files
  perf: cs-etm: Support version 0.1 of HW_ID packets
  perf: cs-etm: Print queue number in raw trace dump
  perf arm-spe: Use old behavior when opening old SPE files

Jason Gunthorpe (5):
  arm64/io: Provide a WC friendly __iowriteXX_copy()
  net: hns3: Remove io_stop_wc() calls after __iowrite64_copy()
  x86: Stop using weak symbols for __iowrite32_copy()
  s390: Implement __iowrite32_copy()
  s390: Stop using weak symbols for __iowrite64_copy()

Jie Zhan (1):
  cppc_cpufreq: Remove HiSilicon CPPC workaround

Kai-Heng Feng (1):
  PCI: Use downstream bridges for distributing resources

Leo Yan (8):
  perf arm-spe: Rename arm_spe__synth_data_source_generic()
  perf arm-spe: Rename the common data source encoding
  perf arm-spe: Support metadata version 2
  perf arm-spe: Introduce arm_spe__is_homogeneous()
  perf arm-spe: Use metadata to decide the data source feature
  perf arm-spe: Remove the unused 'midr' field
  perf arm-spe: Add Cortex CPUs to common data source encoding list
  perf arm-spe: Define metadata header version 2

Namhyung Kim (1):
  tools/include: Sync arm64 headers with the kernel sources

Petr Vaněk (1):
  mm: fix folio_pte_batch() on XEN PV

Piotr Jaroszynski (1):
  Fix mmu notifiers for range-based invalidates

Ryan Roberts (20):
  arm64/mm: make set_ptes() robust when OAs cross 48-bit boundary
  mm: clarify the spec for set_ptes()
  mm: thp: batch-collapse PMD with set_ptes()
  mm: introduce pte_advance_pfn() and use for pte_next_pfn()
  arm64/mm: convert pte_next_pfn() to pte_advance_pfn()
  x86/mm: convert pte_next_pfn() to pte_advance_pfn()
  mm: tidy up pte_next_pfn() definition
  arm64/mm: convert READ_ONCE(*ptep) to ptep_get(ptep)
  arm64/mm: convert set_pte_at() to set_ptes(..., 1)
  arm64/mm: convert ptep_clear() to ptep_get_and_clear()
  arm64/mm: new ptep layer to manage contig bit
  arm64/mm: dplit __flush_tlb_range() to elide trailing DSB
  arm64/mm: wire up PTE_CONT for user mappings
  arm64/mm: implement new wrprotect_ptes() batch API
  arm64/mm: implement new [get_and_]clear_full_ptes() batch APIs
  mm: add pte_batch_hint() to reduce scanning in folio_pte_batch()
  arm64/mm: implement pte_batch_hint()
  arm64/mm: __always_inline to improve fork() perf
  arm64/mm: automatically fold contpte mappings
  arm64/mm: export contpte symbols only to GPL users

Tim Whisonant (3):
  UBUNTU: [Packaging] gke: enable CONFIG_CPUFREQ_ARCH_CUR_FREQ
  UBUNTU: [Packaging] gke: enable CONFIG_ARM64_CONTPTE
  UBUNTU: [Packaging] gke: Add 64k page flavor

Vidya Sagar (1):
  PCI: Clear Secondary Status errors after enumeration

 Documentation/admin-guide/perf/nvidia-pmu.rst |  52 +-
 Documentation/admin-guide/pm/cpufreq.rst      |  17 +-
 arch/arm/include/asm/pgtable.h                |   2 +
 arch/arm/mm/mmu.c                             |   2 +-
 arch/arm64/Kconfig                            |   9 +
 arch/arm64/include/asm/io.h                   | 128 ++++
 arch/arm64/include/asm/pgtable.h              | 431 ++++++++++--
 arch/arm64/include/asm/tlbflush.h             |  68 +-
 arch/arm64/kernel/efi.c                       |   4 +-
 arch/arm64/kernel/io.c                        |  42 ++
 arch/arm64/kernel/mte.c                       |   2 +-
 arch/arm64/kernel/topology.c                  | 150 ++++-
 arch/arm64/kvm/guest.c                        |   2 +-
 arch/arm64/mm/Makefile                        |   1 +
 arch/arm64/mm/contpte.c                       | 404 +++++++++++
 arch/arm64/mm/fault.c                         |  12 +-
 arch/arm64/mm/fixmap.c                        |   4 +-
 arch/arm64/mm/hugetlbpage.c                   |  40 +-
 arch/arm64/mm/kasan_init.c                    |   6 +-
 arch/arm64/mm/mmu.c                           |  16 +-
 arch/arm64/mm/pageattr.c                      |   6 +-
 arch/arm64/mm/trans_pgd.c                     |   6 +-
 arch/nios2/include/asm/pgtable.h              |   2 +
 arch/powerpc/include/asm/pgtable.h            |   2 +
 arch/powerpc/mm/pgtable.c                     |   5 +-
 arch/riscv/include/asm/pgtable.h              |   2 +
 arch/s390/include/asm/io.h                    |  15 +
 arch/s390/include/asm/pgtable.h               |   2 +
 arch/s390/pci/pci.c                           |   6 -
 arch/sparc/include/asm/pgtable_64.h           |   2 +
 arch/x86/include/asm/io.h                     |  17 +
 arch/x86/include/asm/pgtable.h                |   8 +-
 arch/x86/kernel/cpu/aperfmperf.c              |   2 +-
 arch/x86/kernel/cpu/proc.c                    |   7 +-
 arch/x86/lib/Makefile                         |   1 -
 arch/x86/lib/iomap_copy_64.S                  |  15 -
 debian.gke/config/annotations                 |   9 +-
 debian.gke/control.d/gke-64k.inclusion-list   | 259 +++++++
 debian.gke/control.d/vars.gke-64k             |   6 +
 debian.gke/rules.d/arm64.mk                   |   2 +-
 drivers/acpi/numa/hmat.c                      |  24 +-
 drivers/base/arch_topology.c                  |   8 +-
 drivers/cpufreq/Kconfig.x86                   |  12 +
 drivers/cpufreq/cppc_cpufreq.c                |  73 +-
 drivers/cpufreq/cpufreq.c                     |  38 +-
 drivers/hwtracing/coresight/coresight-core.c  | 515 +-------------
 drivers/hwtracing/coresight/coresight-dummy.c |   3 +-
 drivers/hwtracing/coresight/coresight-etb10.c |  29 +-
 .../hwtracing/coresight/coresight-etm-perf.c  |  43 +-
 .../hwtracing/coresight/coresight-etm-perf.h  |  18 -
 drivers/hwtracing/coresight/coresight-etm.h   |   2 -
 .../coresight/coresight-etm3x-core.c          |  32 +-
 .../coresight/coresight-etm3x-sysfs.c         |   4 +-
 .../coresight/coresight-etm4x-core.c          |  35 +-
 drivers/hwtracing/coresight/coresight-etm4x.h |   1 -
 drivers/hwtracing/coresight/coresight-priv.h  |   8 +-
 drivers/hwtracing/coresight/coresight-stm.c   |  33 +-
 drivers/hwtracing/coresight/coresight-sysfs.c | 392 +++++++++++
 .../hwtracing/coresight/coresight-tmc-core.c  |   2 +-
 .../hwtracing/coresight/coresight-tmc-etf.c   |  46 +-
 .../hwtracing/coresight/coresight-tmc-etr.c   |  38 +-
 drivers/hwtracing/coresight/coresight-tmc.h   |   7 +-
 drivers/hwtracing/coresight/coresight-tpda.c  |  13 +-
 drivers/hwtracing/coresight/coresight-tpdm.c  |   3 +-
 drivers/hwtracing/coresight/coresight-tpiu.c  |  14 +-
 .../hwtracing/coresight/coresight-trace-id.c  | 138 ++--
 .../hwtracing/coresight/coresight-trace-id.h  |  70 +-
 drivers/hwtracing/coresight/ultrasoc-smb.c    |  22 +-
 drivers/hwtracing/coresight/ultrasoc-smb.h    |   2 -
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |   4 -
 drivers/pci/doe.c                             |  12 +-
 drivers/pci/probe.c                           |   3 +
 drivers/pci/setup-bus.c                       |   3 +-
 drivers/perf/arm_cspmu/nvidia_cspmu.c         |  75 +--
 include/linux/coresight-pmu.h                 |  17 +-
 include/linux/coresight.h                     | 151 ++---
 include/linux/cpufreq.h                       |   2 +-
 include/linux/efi.h                           |   5 +
 include/linux/io.h                            |   8 +-
 include/linux/pgtable.h                       |  65 +-
 include/uapi/linux/pci_regs.h                 |   1 +
 lib/iomap_copy.c                              |  13 +-
 mm/huge_memory.c                              |  58 +-
 mm/internal.h                                 |  88 +++
 mm/memory.c                                   | 143 ++--
 tools/arch/arm64/include/asm/cputype.h        |  10 +
 tools/include/linux/coresight-pmu.h           |  17 +-
 tools/perf/arch/arm/util/cs-etm.c             | 307 ++++-----
 tools/perf/arch/arm64/util/arm-spe.c          |   8 +-
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  18 +-
 tools/perf/util/arm-spe.c                     | 234 ++++++-
 tools/perf/util/arm-spe.h                     |  38 +-
 tools/perf/util/auxtrace.c                    |   9 +-
 tools/perf/util/auxtrace.h                    |   1 +
 .../perf/util/cs-etm-decoder/cs-etm-decoder.c |  36 +-
 .../perf/util/cs-etm-decoder/cs-etm-decoder.h |   2 +-
 tools/perf/util/cs-etm.c                      | 631 +++++++++++-------
 tools/perf/util/cs-etm.h                      |  12 +-
 98 files changed, 3547 insertions(+), 1815 deletions(-)
 create mode 100644 arch/arm64/mm/contpte.c
 delete mode 100644 arch/x86/lib/iomap_copy_64.S
 create mode 100644 debian.gke/control.d/gke-64k.inclusion-list
 create mode 100644 debian.gke/control.d/vars.gke-64k

-- 
2.43.0




More information about the kernel-team mailing list