[SRU kernel-snaps-uc24.04/pc][RFC PATCH v2 1/2] snapcraft.yaml: Add nvidia-550 and nouveau component support

Juerg Haefliger juerg.haefliger at canonical.com
Tue Dec 10 12:20:01 UTC 2024


On Thu,  5 Dec 2024 13:50:44 +1100
Aaron Jauregui <aaron.jauregui at canonical.com> wrote:

> BugLink: https://bugs.launchpad.net/bugs/2088970
> 
> We use components here with the aim of providing a way for nvidia
> drivers to be selected for the pc-kernel without having to rebuild,
> targetting the nvidia-550 driver as a starting point with the aim of
> supporting more driver versions in the future. Since nouveau, currently
> included in the pc-kernel, conflicts with nvidia, we replace the nouveau
> .ko with a component compatible with the nvidia component scheme.
> 
> Nvidia components are mostly self-contained, but a few changes to the pc-kernel
> snap were required. files/meta/kernel.yaml is required to enable kernel
> module support in snapd. The kernel-gpu-2404 content interface is
> declared for exposing nvidia userspace libraries, and is not intended to
> be accessed directly by users.

What I'm missing here - and this might be explained elsewhere - is an
architectural overview of how this is supposed to work. How are images built?
With or without these components? What happens at the lower level when
components are installed/refreshed/removed? Is there a comprehensive
testplan that covers upgrade scenarios to ensure current users won't break?
How is CERT going to test this? This is quite an intrusive change and I'm
concerned about regressions.


> Signed-off-by: Aaron Jauregui <aaron.jauregui at canonical.com>
> ---
>  files/meta/kernel.yaml |   1 +
>  snapcraft.yaml         | 109 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 109 insertions(+), 1 deletion(-)
>  create mode 100644 files/meta/kernel.yaml
> 
> diff --git a/files/meta/kernel.yaml b/files/meta/kernel.yaml
> new file mode 100644
> index 0000000..aa09f00
> --- /dev/null
> +++ b/files/meta/kernel.yaml
> @@ -0,0 +1 @@
> +dynamic-modules: $SNAP_DATA
> diff --git a/snapcraft.yaml b/snapcraft.yaml
> index c08095e..846e826 100644
> --- a/snapcraft.yaml
> +++ b/snapcraft.yaml
> @@ -14,9 +14,25 @@ platforms:
>    amd64:
>    arm64:
>  
> +components:
> +  nvidia-550-ko:
> +    type: kernel-modules
> +    summary: nvidia 550 kernel objects
> +    description: nvidia 550 kernel objects

Summary and description are user visible so should be properly formatted and
capitalized. The description should provide more information than the
summary and it should be mentioned somehow that this is tied/related to the
Ubuntu kernel snap.


> +
> +  nvidia-550-user:
> +    type: standard
> +    summary: nvidia 550 userspace libraries
> +    description: nvidia 550 userspace libraries
> +
> +  nouveau:

For consistency, this should be nouveau-ko. But why do we need this? Can't we
just move the nouveau driver out of the way when nvidia-550-ko is
installed and move it back on removal?


> +    type: kernel-modules
> +    summary: nouveau kernel module
> +    description: nouveau kernel module
> +
>  parts:
>    kernel:
> -    source: https://git.launchpad.net/canonical-kernel-snaps
> +    source: https://git.launchpad.net/~aaronjauregui/canonical-kernel-snaps
>      source-type: git
>      source-branch: main
>      plugin: nil
> @@ -42,6 +58,39 @@ parts:
>  
>        craftctl default
>  

IMO, all the below shouldn't really be part of the build step, it is staging.

> +      # Move nouveau out of the file tree
> +      find "$CRAFT_PART_INSTALL" -name nouveau.ko.zst -exec mv '{}' "$CRAFT_PART_INSTALL" \;
> +
> +      # move hooks to staging area so they can be picked up by organize

# Move ...


> +      cp hooks/install.pc-kernel hooks/install.module hooks/remove.module \
> +         hooks/install.nvidia-ko hooks/remove.nvidia-ko \
> +         hooks/kernel-gpu-2404-provider-mangler hooks/install.nvidia-user \
> +         hooks/install.nvidia-user hooks/remove.nvidia-user \
> +         $CRAFT_PART_INSTALL/

Hm. cp hools/* "$CRAFT_PART_INSTALL"? Also it's a flat directory. I think I'd
prefer the directory layout to loosely reflect the snap/components structure.
I haven't looked closely but maybe we can come up with a structure that
simplifies the organize steps below.

Also, this is shell, so please double-quote your variables.


> +
> +      # duplicate install module as post-refresh hooks

# Duplicate ...


> +      cp hooks/install.module $CRAFT_PART_INSTALL/post-refresh.module
> +      cp hooks/install.pc-kernel $CRAFT_PART_INSTALL/post-refresh.pc-kernel
> +      cp hooks/install.nvidia-ko $CRAFT_PART_INSTALL/post-refresh.nvidia-ko

Can we use symlinks instead of copy? Is there any reason why you don't have
them already in canonical-kernel-snaps/hooks rather than creating them
dynamically at build time?


Add an empty line here.


> +    # Move nouveau into a dedicated component

This comment seems to be in the wrong place.


> +    organize:
> +      nouveau.ko.zst: (component/nouveau)/
> +      install.module: (component/nouveau)/snap/hooks/install
> +      post-refresh.module: (component/nouveau)/snap/hooks/post-refresh
> +      remove.module: (component/nouveau)/snap/hooks/remove
> +
> +      install.pc-kernel: snap/hooks/install
> +      post-refresh.pc-kernel: snap/hooks/post-refresh
> +
> +      install.nvidia-ko: (component/nvidia-550-ko)/snap/hooks/install
> +      post-refresh.nvidia-ko: (component/nvidia-550-ko)/snap/hooks/post-refresh
> +      remove.nvidia-ko: (component/nvidia-550-ko)/snap/hooks/remove
> +
> +      kernel-gpu-2404-provider-mangler: (component/nvidia-550-user)/kernel-gpu-2404-provider-mangler
> +      install.nvidia-user: (component/nvidia-550-user)/snap/hooks/install
> +      post-refresh.nvidia-user: (component/nvidia-550-user)/snap/hooks/post-refresh
> +      remove.nvidia-user: (component/nvidia-550-user)/snap/hooks/remove
> +

I don't claim I fully understand how the organize step fits into the bigger
picture but it seems wrong that it fiddles with component artifacts when it's
part of the 'kernel' part. It should only do the pc-kernel stuff and the
rest should be handled in the individual part sections, no? E.g. all the
*.nvidia-user files should be organized in nvidia-550-user-comp/organize.


>      override-stage: |
>        echo STAGE
>  
> @@ -78,3 +127,61 @@ parts:
>        mkdir "$CRAFT_PART_INSTALL"/firmware/updates
>  
>        craftctl default
> +
> +  files:
> +    plugin: dump
> +    source: files

Please add a comment what this is doing and why it's needed.


> +
> +  nvidia-550-ko-comp:
> +    plugin: nil
> +
> +    stage-packages:
> +      - binutils
> +      - make
> +
> +    override-build: |
> +      craftctl default
> +      version="$(craftctl get version)"
> +
> +      #clean up unnecessary libs

# Clean ...


> +      rm -f -- $CRAFT_PART_INSTALL/usr/lib/$(uname -m)-linux-gnu/libc.so.6

Why?


> +      apt-get download linux-objects-nvidia-550-server-${version%.*}-generic \
> +          linux-signatures-nvidia-${version%.*}-generic
> +      for i in `find . -name '*.deb'` ; do dpkg-deb -x $i nvidia-objects ;  done

Your searching the whole tree, any reason not to use "for i in *.deb;"?


> +
> +      mkdir -p $CRAFT_PART_INSTALL/bits
> +      mv nvidia-objects/lib/modules/*/kernel/nvidia-550srv/bits/* $CRAFT_PART_INSTALL/bits
> +
> +    organize:
> +      bits/: (component/nvidia-550-ko)/bits
> +      usr/bin: (component/nvidia-550-ko)/bin
> +      usr/lib: (component/nvidia-550-ko)/lib
> +
> +  nvidia-550-user-comp:
> +    plugin: nil
> +    stage-packages:
> +      - libnvidia-egl-wayland1
> +      - libnvidia-cfg1-550-server
> +      - libnvidia-common-550-server
> +      - libnvidia-compute-550-server
> +      - libnvidia-decode-550-server
> +      - libnvidia-encode-550-server
> +      - libnvidia-extra-550-server
> +      - libnvidia-gl-550-server
> +      - libnvidia-fbc1-550-server
> +      - nvidia-utils-550-server
> +      - xserver-xorg-video-nvidia-550
> +      - libnvidia-extra-550
> +
> +    organize:
> +      usr/share: (component/nvidia-550-user)/usr/share
> +      usr/lib: (component/nvidia-550-user)/usr/lib
> +      usr/bin/nvidia-smi: (component/nvidia-550-user)/usr/bin/nvidia-smi
> +
> +
> +
> +slots:
> +  kernel-gpu-2404:
> +    interface: content
> +    read:
> +      - $SNAP_COMMON/kernel-gpu-2404


Finally:
- You're moving and adding kernel modules. Don't you need depmod somewhere?
- Have you tried to build this on arm64?

...Juerg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20241210/f2f45228/attachment.sig>


More information about the kernel-team mailing list