NACK: [SRU][kernel-snaps-uc24.04/pc][PATCH v4 1/1] snapcraft.yaml: Add nvidia-550 and nouveau component support support

Juerg Haefliger juerg.haefliger at canonical.com
Mon Jan 20 13:21:11 UTC 2025


On Thu,  9 Jan 2025 09:33:02 +1100
Aaron Jauregui <aaron.jauregui at canonical.com> wrote:

> BugLink: https://bugs.launchpad.net/bugs/2088970
> 
> We use components here with the aim of providing a way for nvidia
> drivers to be selected for the pc-kernel without having to rebuild,
> targetting the nvidia-550 driver as a starting point with the aim of
> supporting more driver versions in the future. Since nouveau, currently
> included in the pc-kernel, conflicts with nvidia, we replace the nouveau
> .ko with a component compatible with the nvidia component scheme.
> 
> Nvidia components are mostly self-contained, but a few changes to the pc-kernel
> snap were required. files/meta/kernel.yaml is required to enable kernel
> module support in snapd. The kernel-gpu-2404 content interface is
> declared for exposing nvidia userspace libraries, and is not intended to
> be accessed directly by users.
> 
> Signed-off-by: Aaron Jauregui <aaron.jauregui at canonical.com>
> ---
>  files/meta/kernel.yaml |   1 +
>  nvidia_packages        |  11 ++
>  snapcraft.yaml         | 225 ++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 236 insertions(+), 1 deletion(-)
>  create mode 100644 files/meta/kernel.yaml
>  create mode 100644 nvidia_packages
> 
> diff --git a/files/meta/kernel.yaml b/files/meta/kernel.yaml
> new file mode 100644
> index 0000000..aa09f00
> --- /dev/null
> +++ b/files/meta/kernel.yaml
> @@ -0,0 +1 @@
> +dynamic-modules: $SNAP_DATA
> diff --git a/nvidia_packages b/nvidia_packages
> new file mode 100644
> index 0000000..abda5d3
> --- /dev/null
> +++ b/nvidia_packages
> @@ -0,0 +1,11 @@
> +libnvidia-cfg1-550-server
> +libnvidia-common-550-server
> +libnvidia-compute-550-server
> +libnvidia-decode-550-server
> +libnvidia-encode-550-server
> +libnvidia-extra-550-server
> +libnvidia-gl-550-server
> +libnvidia-gl-550-server
> +libnvidia-fbc1-550-server
> +nvidia-utils-550-server
> +xserver-xorg-video-nvidia-550-server

We should probably have a meta package that depends on all of these. Maybe.
Further comments below.


> diff --git a/snapcraft.yaml b/snapcraft.yaml
> index c08095e..173e419 100644
> --- a/snapcraft.yaml
> +++ b/snapcraft.yaml
> @@ -14,9 +14,25 @@ platforms:
>    amd64:
>    arm64:
>  
> +components:
> +  nvidia-550-ko:
> +    type: kernel-modules
> +    summary: Nvidia 550 kernel objects
> +    description: Nvidia 550 driver kernel objects for the Ubuntu generic kernel snap
> +
> +  nvidia-550-user:
> +    type: standard
> +    summary: Nvidia 550 userspace libraries
> +    description: Userspace libraries required by the Nvidia 550 driver for the Ubuntu generic kernel snap
> +
> +  nouveau:

I'd still like this to be called nouveau-ko for namespace consistency
reasons. But the bigger issue is that I'm still not convinced that we need
this. depmod can look in different places [1][2] for external modules which
is IMO what we want so that we don't have to produce a nouveau component and
shuffle nouveau.ko around.

Assuming this is feasible, the simplest is to put the new modules in
/lib/module/$(uname -r)/updates/

But we should probably add some directory hierarchy to account for potential
additional future kernel components:
/lib/module/$(uname -r)/updates/<component-name>/


> +    type: kernel-modules
> +    summary: Nouveau kernel module
> +    description: The Nouveau kernel module for the Ubuntu generic kernel snap
> +
>  parts:
>    kernel:
> -    source: https://git.launchpad.net/canonical-kernel-snaps
> +    source: https://git.launchpad.net/~aaronjauregui/canonical-kernel-snaps

This is obviously wrong now for a non-RFC patch :-)


>      source-type: git
>      source-branch: main
>      plugin: nil
> @@ -42,6 +58,23 @@ parts:
>  
>        craftctl default

This should probably be the last statement of the build block. Or the first,
to be consistent with the other newly added build blocks. Unless there is a
reason for it to be where it is?


>  
> +      # Move nouveau out of the file tree
> +      find "$CRAFT_PART_INSTALL" -name nouveau.ko.zst -exec mv '{}' "$CRAFT_PART_INSTALL" \;
> +
> +      # Move hooks to staging area so they can be picked up by organize
> +      mv hooks/module/* "$CRAFT_PART_INSTALL"
> +      mv hooks/pc-kernel/* "$CRAFT_PART_INSTALL"
> +
> +    organize:
> +      # Organize nouveau into a dedicated component
> +      nouveau.ko.zst: (component/nouveau)/
> +      install.module: (component/nouveau)/snap/hooks/install
> +      post-refresh.module: (component/nouveau)/snap/hooks/post-refresh
> +      remove.module: (component/nouveau)/snap/hooks/remove
> +
> +      install.pc-kernel: snap/hooks/install
> +      post-refresh.pc-kernel: snap/hooks/post-refresh
> +
>      override-stage: |
>        echo STAGE
>  
> @@ -78,3 +111,193 @@ parts:
>        mkdir "$CRAFT_PART_INSTALL"/firmware/updates
>  
>        craftctl default
> +
> +  # Kernel object component support requires a kernel.yaml file
> +  # configured with dynamic-modules: $SNAP_DATA
> +  files:
> +    plugin: dump
> +    source: files
> +
> +  nvidia-550-ko-comp:
> +    source: https://git.launchpad.net/~aaronjauregui/canonical-kernel-snaps

This URL needs fixing too.


> +    source-type: git
> +    source-branch: main
> +    plugin: nil
> +
> +    stage-packages:
> +      - binutils
> +      - make
> +
> +    override-build: |
> +      craftctl default
> +      version="$(craftctl get version)"
> +
> +      # Clean up unnecessary libs
> +      rm -f -- "$CRAFT_PART_INSTALL/usr/lib/$(uname -m)-linux-gnu/libc.so.6"

Please add more information to the comment why this removal is necessary.


> +
> +      # Extracting obj package versions
> +      obj_ver="$(apt -a list linux-objects-nvidia-550-server-"${version%.*}"-generic | awk -F'/' '!/^(Listing)/{print $2}' | awk -F' ' '{print $2}')"
> +
> +      # Checking for matching nvidia user packages
> +      tmpdir="$(mktemp -d)"
> +      nvidia_usr_ver=""
> +
> +      while IFS= read -r line; do
> +        if [[ !($line =~ $version) ]]; then
> +          break
> +        fi
> +        rm -rf "$tmpdir"/*
> +        echo extracting nvidia version from "linux-objects-nvidia-550-server-${version%.*}-generic=$line ..."
> +
> +        apt-get download "linux-objects-nvidia-550-server-${version%.*}-generic=$line" \
> +                "linux-signatures-nvidia-${version%.*}-generic=$line"
> +        for i in *.deb; do dpkg-deb -x "$i" "$tmpdir/nvidia-objects" ;  done
> +        mkdir -p "$tmpdir/nvidia-bits"
> +        mv "$tmpdir"/nvidia-objects/lib/modules/"${version%.*}"-generic/kernel/nvidia-550srv/bits/* "$tmpdir/nvidia-bits"
> +
> +        # Extract nvidia driver version
> +        nvidia_obj_ver="$(grep -ao 'firmware=nvidia/.*\.bin' "$tmpdir/nvidia-bits/nvidia/nv.o" | awk -F'/' '{print $2}')"
> +
> +        # Look for matching user packages
> +        echo Selecting nvidia version $nvidia_obj_ver
> +        echo Looking for userspace package candidates ...
> +        for pkg in $(cat $CRAFT_PROJECT_DIR/nvidia_packages); do
> +          nvidia_usr_ver="$(echo $pkg | awk -F'=' '{print $2}')"
> +          echo "$pkg $nvidia_usr_ver"
> +          if [ -z "$nvidia_usr_ver" ]; then
> +          # Look for userspace pkg candidate versions
> +          nvidia_usr_versions="$(apt -a list $pkg 2> /dev/null | awk -F'/' '!/^(Listing)/{print $2}' | awk -F ' ' '{print $2}')"
> +          while IFS= read -r ver_line; do
> +            if [[ "$ver_line" =~ "$nvidia_obj_ver" ]]; then
> +              echo "Found compatible version $ver_line for package $pkg"
> +              nvidia_usr_ver="$ver_line"
> +              continue 2
> +            else
> +              echo "Version $ver_line not compatible with nvidia version $nvidia_obj_ver for package $pkg"
> +              nvidia_usr_ver=""
> +            fi
> +              done <<< "$nvidia_usr_versions"
> +            if [ -z nvidia_usr_ver ]; then
> +              echo "Compatible nvidia version $nvidia_obj_ver not found for package $pkg"
> +              break 2
> +            fi
> +          else
> +            if [[ "$nvidia_usr_ver" =~ "$nvidia_obj_ver" ]]; then
> +              echo "Found compatible version $nvidia_usr_ver for package $pkg"
> +              continue
> +            else
> +              echo "Compatible nvidia version $nvidia_obj_ver not found for package $pkg"
> +              nvidia_usr_ver=""
> +              break
> +            fi
> +          fi
> +        done
> +        if [ ! -z "$nvidia_usr_ver" ]; then
> +          break
> +        fi
> +      done <<< "$obj_ver"

Please explain what the above does. I'm not sure what sort of
package/version matching this is doing. Also, given it's complexity, it
should probably be a dedicated script.


> +
> +
> +      if [ -z "$nvidia_usr_ver" ]; then
> +        echo "ERROR. Cannot find compatible nvidia object packages compatible with user packages. Exiting."
> +        return 1
> +      else
> +        echo "Compatible nvidia userspace package found: $nvidia_usr_ver"
> +        rm -rf ./parts/nvidia-550-ko-comp/build/nvidia_usr_ver
> +        # Save nvidia userspace lib version for further handling in nvidia-550-user-comp
> +        echo $nvidia_usr_ver > "$CRAFT_PART_BUILD"/nvidia_usr_ver
> +      fi
> +
> +      # Move nvidia kernel objects
> +      mv "$tmpdir"/nvidia-bits "$CRAFT_PART_INSTALL"/bits
> +      rm -rf "$tmpdir"
> +
> +      # Move hooks
> +      mv hooks/nvidia-ko/* "$CRAFT_PART_INSTALL"
> +
> +    organize:
> +      bits/: (component/nvidia-550-ko)/bits
> +      usr/bin: (component/nvidia-550-ko)/bin
> +      usr/lib: (component/nvidia-550-ko)/lib
> +
> +      install.nvidia-ko: (component/nvidia-550-ko)/snap/hooks/install
> +      post-refresh.nvidia-ko: (component/nvidia-550-ko)/snap/hooks/post-refresh
> +      remove.nvidia-ko: (component/nvidia-550-ko)/snap/hooks/remove
> +
> +  nvidia-550-user-comp:
> +    source: https://git.launchpad.net/~aaronjauregui/canonical-kernel-snaps

URL needs updating.


> +    source-type: git
> +    source-branch: main
> +    plugin: nil
> +
> +    after:
> +      - nvidia-550-ko-comp
> +
> +    override-build: |
> +      craftctl default
> +
> +      # Move hooks
> +      mv hooks/nvidia-user/* "$CRAFT_PART_INSTALL"
> +
> +      # Get NVIDIA userspace lib ver
> +      nvidia_usr_ver="$(<$CRAFT_PART_BUILD/../../nvidia-550-ko-comp/build/nvidia_usr_ver)"

That looks brittle. Can you use a temporary path /tmp/nvidia_usr_ver?


> +
> +      # Stage nvidia libs
> +      apt_cache="$(mktemp -d)"
> +      dpkg_status="$(mktemp)"
> +
> +      apt_get_param="--download-only --assume-yes -o APT::Sandbox::User=root -o Dir::Cache=$apt_cache -o Dir::State::status=$dpkg_status"
> +      stage_packages=""
> +
> +      while read LINE
> +      do
> +        stage_packages="$stage_packages $LINE=$nvidia_usr_ver"
> +        echo "$stage_packages"
> +      done < "$CRAFT_PROJECT_DIR"/nvidia_packages
> +
> +      # Fetch nvidia drivers
> +      apt-get $apt_get_param install $stage_packages
> +
> +      # Unpack nvidia drivers
> +      for file in "$apt_cache"/archives/*.deb; do
> +        dpkg-deb -x $file "$CRAFT_PART_INSTALL"
> +      done
> +
> +      # Cleanup
> +      rm -rf $apt_cache
> +      rm -rf $dpkg_status

Same here, should probably be a dedicated script. Have a look at chdist and
[3], it might do what you need. 

There's lots of effort put into figuring out what packages and versions to
download. This feels too complicated. Would it help to make nvidia and/or
kernel packaging changes and provide additional meta packages?

What we want is a set of nvidia packages that match a kernel version?


> +
> +    stage-packages:
> +      - libnvidia-egl-wayland1
> +
> +    organize:
> +      usr/share: (component/nvidia-550-user)/usr/share
> +      usr/lib: (component/nvidia-550-user)/usr/lib
> +      usr/bin/nvidia-smi: (component/nvidia-550-user)/usr/bin/nvidia-smi
> +      kernel-gpu-2404-provider-mangler: (component/nvidia-550-user)/kernel-gpu-2404-provider-mangler
> +      install.nvidia-user: (component/nvidia-550-user)/snap/hooks/install
> +      post-refresh.nvidia-user: (component/nvidia-550-user)/snap/hooks/post-refresh
> +      remove.nvidia-user: (component/nvidia-550-user)/snap/hooks/remove
> +
> +    override-stage: |
> +      craftctl default
> +      # Clean up leftover libs
> +      rm -rf "$CRAFT_PART_INSTALL"/{_tmp,boot,lib,usr}

The code doesn't match the comment. Is override-stage really needed here?


> +
> +  # Prune duplicate mesa libraries
> +  nvidia-550-user-cleanup:
> +    after: [nvidia-550-user-comp]
> +    source: https://github.com/canonical/gpu-snap.git
> +    plugin: dump
> +    override-prime: |
> +      craftctl default
> +      CRAFT_PRIME="$CRAFT_COMPONENT_NVIDIA_550_USER_PRIME" \
> +        "$CRAFT_PART_SRC"/bin/gpu-2404-cleanup mesa-2404

Ugg. Why do we need this?


> +
> +
> +
> +slots:
> +  kernel-gpu-2404:
> +    interface: content
> +    read:
> +      - $SNAP_COMMON/kernel-gpu-2404


...Juerg

[1] man depmod.d
[2] https://lore.kernel.org/lkml/Y5vvVTwt+FfxTUke@bergen.fjasle.eu/T/
[3] https://git.launchpad.net/~canonical-kernel-snaps/canonical-kernel-snaps/+git/kernel-snaps-u24.10/tree/stage-from-series
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250120/f41340d0/attachment.sig>


More information about the kernel-team mailing list