ACK: [SRU][canonical-kernel-snap/main][kernel-snaps-uc24.04/pc][PATCH v5 0/1] add nvidia-550 driver components

Juerg Haefliger juerg.haefliger at canonical.com
Thu Feb 13 08:36:54 UTC 2025


I have a few comments and I believe there are still some issues as pointed
out in the other replies. But I'm ack'ing in order to make forward progress.
We can follow up with fix commits once the patches are applied.

Acked-by: Juerg Haefliger <juerg.haefliger at canonical.com>

...Juerg


On Thu, 30 Jan 2025 12:05:33 +1100
Aaron Jauregui <aaron.jauregui at canonical.com> wrote:

> BugLink: https://bugs.launchpad.net/bugs/2088970
> 
> [Changes between v4 and v5]
> - No longer using a workaround script to match ko and userspace
>   versions, instead settling on hardcoding in versions after a discussion 
>   with apw. This is a temporary solution, as snapcraft is working on a way
>   to be able to specify deb versions from within an override block.
> 
> - Fixed links to canonical-kernel-snaps repo
> - renamed nouveau component to nouveau-ko
> - Expanded comments regarding lib cleanup and pruning libc from
>   nvidia-user component.
> 
> [Changes between v3 and v4]
> - Patchset no longer RFC, as snapd component support should be landing
>   in latest/stable.
> 
> - Fixed a bug in the build process that could result in version
>   mismatches of nvidia libraries between the ko and user components
>   depending on the status of the deb archives for each. This required an
>   overhaul of the build scripts. This is intended as a temporary solution,
>   and I intend to replace it with an swm template-based approach soon.
> 
> - Added a cleanup stanza for the nvidia-550-user component. Testing
>   discovered issues with conflicts created by mesa libraries that
>   should not be present in the component, so these are pruned.
> 
> [Changes between v2 and v3]
>   - cleaned up scripts and comments
>   - added better summaries/descriptions for components
>   - restructured hooks directory into per-component subdirectories
>   - shifted organize blocks to respective components
>   - replaced post-refresh hook copying with corresponding files in the
>     hooks directory (note: snapcraft did not support using symlinks for
>     this)
>   - updated [Impact] section with more detailed information
> 
> [Changes between v1 and v2]
> 
> kernel-snaps-u24.04:
>   - replaced TODO HACK FOR HOOKS
>   - updated nvidia userspace component type to standard
> 
> hooks:
>   - included install hook for the pc-kernel to install nouveau
>     component by default
> 
> [Impact]
> Snap components are a way to have optional content for snaps available
> for install without resorting to building a completely new snap. It's
> useful to think of them as lazy loading for snaps. Concretely, components
> are themselves snaps with locked-down functionality that are mounted
> within their parent snap's filesystem. Component revisions are tied 1 to
> 1 with their parent snap revision at upload time, meaning that any refresh
> also refreshes the components tied to the snap. This also means that
> components MUST be uploaded alongside the parent snap, or the store will
> reject the upload.
> 
> We use components here with the aim of providing a way for nvidia
> drivers to be selected for the pc-kernel without having to rebuild,
> targetting the nvidia-550 driver as a starting point with the aim of
> supporting more driver versions in the future. Since nouveau, currently
> included in the pc-kernel, conflicts with nvidia, we replace the nouveau
> .ko with a component compatible with the nvidia component scheme.
> 
> Images are intended to be built either with a nvidia graphics component
> either preloaded by being declared in the model, or at first boot, where the
> pc-kernel's install hook will detect that no nvidia graphics component exists
> and download nouveau. This should cover the uppgrade case for users of the
> existing pc-kernel that rely on nouveau. I am working on further functionality
> allowing for snap set to configure the desired graphics version and either
> configure it (if the component exists on disk) or to fetch it from the store.
> 
> The implemented components rely on install, refresh, and remove hooks
> for the respective functionality. These are intended to be placed in
> canonical-kernel-snaps. All 3 implemented components have a post-refresh hook
> that is identical to their install hook.
> 
>   - For the nvidia-ko and nouveau hooks, these hooks copy the corresponding
>     kernel modules to $SNAP_DATA/$(uname -r)/graphics, with the nvidia-ko hook
>     linking the modules and attaching their module signatures before moving
>     them. Both components' remove hooks delete the graphics directory. The
>     nouveau hooks are intended to be generic kernel module component hooks
>     with a special case for nouveau.
> 
>   - For the nvidia-user hooks, a directory corresponding to the kernel-gpu-2404
>     interface is created. A sentinel file is placed in the directory to be able
>     to notify the consumer of the kernel-gpu-2404 interface of file changes
>     (e.g. a refresh). All the libraries in the component are copied into this
>     directory. A mangler script is added for the consuming snap to have the
>     correct environment variables present when using the provided libraries.
> 
> Nvidia components are mostly self-contained, but a few changes to the pc-kernel
> snap were required. files/meta/kernel.yaml is required to enable kernel
> module support in snapd. The kernel-gpu-2404 content interface is
> declared for exposing nvidia userspace libraries, and is not intended to
> be accessed directly by users.
> 
> The current test plan on our end to my understanding includes smoke testing for
> both the ubuntu core use case and the hybrid case with tpm-backed fde (emulated
> through kvm, as hardware testing is currently not functional for tpm-backed
> fde).  Cert should be in charge of testing beyond this, but I don't have much
> information about this yet. I should be able to confirm this and explain the
> test plan in better detail early next week.
> 
> [Test case]
> Nvidia components can be installed as follows:
> 
>  $ snap install pc-kernel+nvidia-550-ko pc-kernel+nvidia-550-user
> 
> The components install their files in $SNAP_DATA/modules/$(uname -r)/graphics
> 
> [Regression potential]
> There is potential for regressions to be introduced by the pc-kernel install
> hook, as it is executed on every install and and refresh event. If this
> script fails, the installation or update of the snap will abort.
> 
> Aaron Jauregui (1):
>   snapcraft.yaml: Add nvidia-550 and nouveau component support
> 
>  files/meta/kernel.yaml |   1 +
>  snapcraft.yaml         | 131 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 130 insertions(+), 2 deletions(-)
>  create mode 100644 files/meta/kernel.yaml
> 
> Aaron Jauregui (1):
>   nvidia-hooks: add hooks for nvidia kernel components
> 
>  hooks/module/install.module                   | 21 ++++++++++++++++
>  hooks/module/post-refresh.module              | 21 ++++++++++++++++
>  hooks/module/remove.module                    | 13 ++++++++++
>  hooks/nvidia-ko/install.nvidia-ko             | 25 +++++++++++++++++++
>  hooks/nvidia-ko/post-refresh.nvidia-ko        | 25 +++++++++++++++++++
>  hooks/nvidia-ko/remove.nvidia-ko              |  6 +++++
>  hooks/nvidia-user/install.nvidia-user         | 18 +++++++++++++
>  .../kernel-gpu-2404-provider-mangler          | 12 +++++++++
>  hooks/nvidia-user/remove.nvidia-user          | 10 ++++++++
>  hooks/pc-kernel/install.pc-kernel             |  6 +++++
>  hooks/pc-kernel/post-refresh.pc-kernel        |  6 +++++
>  11 files changed, 163 insertions(+)
>  create mode 100644 hooks/module/install.module
>  create mode 100644 hooks/module/post-refresh.module
>  create mode 100644 hooks/module/remove.module
>  create mode 100644 hooks/nvidia-ko/install.nvidia-ko
>  create mode 100644 hooks/nvidia-ko/post-refresh.nvidia-ko
>  create mode 100644 hooks/nvidia-ko/remove.nvidia-ko
>  create mode 100644 hooks/nvidia-user/install.nvidia-user
>  create mode 100644 hooks/nvidia-user/kernel-gpu-2404-provider-mangler
>  create mode 100644 hooks/nvidia-user/remove.nvidia-user
>  create mode 100644 hooks/pc-kernel/install.pc-kernel
>  create mode 100644 hooks/pc-kernel/post-refresh.pc-kernel
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250213/9d7953d0/attachment.sig>


More information about the kernel-team mailing list