NACK: [SRU][kernel-snaps-uc24.04/pc][PATCH v4 1/1] snapcraft.yaml: Add nvidia-550 and nouveau component support support
Juerg Haefliger
juerg.haefliger at canonical.com
Mon Jan 20 13:21:11 UTC 2025
On Thu, 9 Jan 2025 09:33:02 +1100
Aaron Jauregui <aaron.jauregui at canonical.com> wrote:
> BugLink: https://bugs.launchpad.net/bugs/2088970
>
> We use components here with the aim of providing a way for nvidia
> drivers to be selected for the pc-kernel without having to rebuild,
> targetting the nvidia-550 driver as a starting point with the aim of
> supporting more driver versions in the future. Since nouveau, currently
> included in the pc-kernel, conflicts with nvidia, we replace the nouveau
> .ko with a component compatible with the nvidia component scheme.
>
> Nvidia components are mostly self-contained, but a few changes to the pc-kernel
> snap were required. files/meta/kernel.yaml is required to enable kernel
> module support in snapd. The kernel-gpu-2404 content interface is
> declared for exposing nvidia userspace libraries, and is not intended to
> be accessed directly by users.
>
> Signed-off-by: Aaron Jauregui <aaron.jauregui at canonical.com>
> ---
> files/meta/kernel.yaml | 1 +
> nvidia_packages | 11 ++
> snapcraft.yaml | 225 ++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 236 insertions(+), 1 deletion(-)
> create mode 100644 files/meta/kernel.yaml
> create mode 100644 nvidia_packages
>
> diff --git a/files/meta/kernel.yaml b/files/meta/kernel.yaml
> new file mode 100644
> index 0000000..aa09f00
> --- /dev/null
> +++ b/files/meta/kernel.yaml
> @@ -0,0 +1 @@
> +dynamic-modules: $SNAP_DATA
> diff --git a/nvidia_packages b/nvidia_packages
> new file mode 100644
> index 0000000..abda5d3
> --- /dev/null
> +++ b/nvidia_packages
> @@ -0,0 +1,11 @@
> +libnvidia-cfg1-550-server
> +libnvidia-common-550-server
> +libnvidia-compute-550-server
> +libnvidia-decode-550-server
> +libnvidia-encode-550-server
> +libnvidia-extra-550-server
> +libnvidia-gl-550-server
> +libnvidia-gl-550-server
> +libnvidia-fbc1-550-server
> +nvidia-utils-550-server
> +xserver-xorg-video-nvidia-550-server
We should probably have a meta package that depends on all of these. Maybe.
Further comments below.
> diff --git a/snapcraft.yaml b/snapcraft.yaml
> index c08095e..173e419 100644
> --- a/snapcraft.yaml
> +++ b/snapcraft.yaml
> @@ -14,9 +14,25 @@ platforms:
> amd64:
> arm64:
>
> +components:
> + nvidia-550-ko:
> + type: kernel-modules
> + summary: Nvidia 550 kernel objects
> + description: Nvidia 550 driver kernel objects for the Ubuntu generic kernel snap
> +
> + nvidia-550-user:
> + type: standard
> + summary: Nvidia 550 userspace libraries
> + description: Userspace libraries required by the Nvidia 550 driver for the Ubuntu generic kernel snap
> +
> + nouveau:
I'd still like this to be called nouveau-ko for namespace consistency
reasons. But the bigger issue is that I'm still not convinced that we need
this. depmod can look in different places [1][2] for external modules which
is IMO what we want so that we don't have to produce a nouveau component and
shuffle nouveau.ko around.
Assuming this is feasible, the simplest is to put the new modules in
/lib/module/$(uname -r)/updates/
But we should probably add some directory hierarchy to account for potential
additional future kernel components:
/lib/module/$(uname -r)/updates/<component-name>/
> + type: kernel-modules
> + summary: Nouveau kernel module
> + description: The Nouveau kernel module for the Ubuntu generic kernel snap
> +
> parts:
> kernel:
> - source: https://git.launchpad.net/canonical-kernel-snaps
> + source: https://git.launchpad.net/~aaronjauregui/canonical-kernel-snaps
This is obviously wrong now for a non-RFC patch :-)
> source-type: git
> source-branch: main
> plugin: nil
> @@ -42,6 +58,23 @@ parts:
>
> craftctl default
This should probably be the last statement of the build block. Or the first,
to be consistent with the other newly added build blocks. Unless there is a
reason for it to be where it is?
>
> + # Move nouveau out of the file tree
> + find "$CRAFT_PART_INSTALL" -name nouveau.ko.zst -exec mv '{}' "$CRAFT_PART_INSTALL" \;
> +
> + # Move hooks to staging area so they can be picked up by organize
> + mv hooks/module/* "$CRAFT_PART_INSTALL"
> + mv hooks/pc-kernel/* "$CRAFT_PART_INSTALL"
> +
> + organize:
> + # Organize nouveau into a dedicated component
> + nouveau.ko.zst: (component/nouveau)/
> + install.module: (component/nouveau)/snap/hooks/install
> + post-refresh.module: (component/nouveau)/snap/hooks/post-refresh
> + remove.module: (component/nouveau)/snap/hooks/remove
> +
> + install.pc-kernel: snap/hooks/install
> + post-refresh.pc-kernel: snap/hooks/post-refresh
> +
> override-stage: |
> echo STAGE
>
> @@ -78,3 +111,193 @@ parts:
> mkdir "$CRAFT_PART_INSTALL"/firmware/updates
>
> craftctl default
> +
> + # Kernel object component support requires a kernel.yaml file
> + # configured with dynamic-modules: $SNAP_DATA
> + files:
> + plugin: dump
> + source: files
> +
> + nvidia-550-ko-comp:
> + source: https://git.launchpad.net/~aaronjauregui/canonical-kernel-snaps
This URL needs fixing too.
> + source-type: git
> + source-branch: main
> + plugin: nil
> +
> + stage-packages:
> + - binutils
> + - make
> +
> + override-build: |
> + craftctl default
> + version="$(craftctl get version)"
> +
> + # Clean up unnecessary libs
> + rm -f -- "$CRAFT_PART_INSTALL/usr/lib/$(uname -m)-linux-gnu/libc.so.6"
Please add more information to the comment why this removal is necessary.
> +
> + # Extracting obj package versions
> + obj_ver="$(apt -a list linux-objects-nvidia-550-server-"${version%.*}"-generic | awk -F'/' '!/^(Listing)/{print $2}' | awk -F' ' '{print $2}')"
> +
> + # Checking for matching nvidia user packages
> + tmpdir="$(mktemp -d)"
> + nvidia_usr_ver=""
> +
> + while IFS= read -r line; do
> + if [[ !($line =~ $version) ]]; then
> + break
> + fi
> + rm -rf "$tmpdir"/*
> + echo extracting nvidia version from "linux-objects-nvidia-550-server-${version%.*}-generic=$line ..."
> +
> + apt-get download "linux-objects-nvidia-550-server-${version%.*}-generic=$line" \
> + "linux-signatures-nvidia-${version%.*}-generic=$line"
> + for i in *.deb; do dpkg-deb -x "$i" "$tmpdir/nvidia-objects" ; done
> + mkdir -p "$tmpdir/nvidia-bits"
> + mv "$tmpdir"/nvidia-objects/lib/modules/"${version%.*}"-generic/kernel/nvidia-550srv/bits/* "$tmpdir/nvidia-bits"
> +
> + # Extract nvidia driver version
> + nvidia_obj_ver="$(grep -ao 'firmware=nvidia/.*\.bin' "$tmpdir/nvidia-bits/nvidia/nv.o" | awk -F'/' '{print $2}')"
> +
> + # Look for matching user packages
> + echo Selecting nvidia version $nvidia_obj_ver
> + echo Looking for userspace package candidates ...
> + for pkg in $(cat $CRAFT_PROJECT_DIR/nvidia_packages); do
> + nvidia_usr_ver="$(echo $pkg | awk -F'=' '{print $2}')"
> + echo "$pkg $nvidia_usr_ver"
> + if [ -z "$nvidia_usr_ver" ]; then
> + # Look for userspace pkg candidate versions
> + nvidia_usr_versions="$(apt -a list $pkg 2> /dev/null | awk -F'/' '!/^(Listing)/{print $2}' | awk -F ' ' '{print $2}')"
> + while IFS= read -r ver_line; do
> + if [[ "$ver_line" =~ "$nvidia_obj_ver" ]]; then
> + echo "Found compatible version $ver_line for package $pkg"
> + nvidia_usr_ver="$ver_line"
> + continue 2
> + else
> + echo "Version $ver_line not compatible with nvidia version $nvidia_obj_ver for package $pkg"
> + nvidia_usr_ver=""
> + fi
> + done <<< "$nvidia_usr_versions"
> + if [ -z nvidia_usr_ver ]; then
> + echo "Compatible nvidia version $nvidia_obj_ver not found for package $pkg"
> + break 2
> + fi
> + else
> + if [[ "$nvidia_usr_ver" =~ "$nvidia_obj_ver" ]]; then
> + echo "Found compatible version $nvidia_usr_ver for package $pkg"
> + continue
> + else
> + echo "Compatible nvidia version $nvidia_obj_ver not found for package $pkg"
> + nvidia_usr_ver=""
> + break
> + fi
> + fi
> + done
> + if [ ! -z "$nvidia_usr_ver" ]; then
> + break
> + fi
> + done <<< "$obj_ver"
Please explain what the above does. I'm not sure what sort of
package/version matching this is doing. Also, given it's complexity, it
should probably be a dedicated script.
> +
> +
> + if [ -z "$nvidia_usr_ver" ]; then
> + echo "ERROR. Cannot find compatible nvidia object packages compatible with user packages. Exiting."
> + return 1
> + else
> + echo "Compatible nvidia userspace package found: $nvidia_usr_ver"
> + rm -rf ./parts/nvidia-550-ko-comp/build/nvidia_usr_ver
> + # Save nvidia userspace lib version for further handling in nvidia-550-user-comp
> + echo $nvidia_usr_ver > "$CRAFT_PART_BUILD"/nvidia_usr_ver
> + fi
> +
> + # Move nvidia kernel objects
> + mv "$tmpdir"/nvidia-bits "$CRAFT_PART_INSTALL"/bits
> + rm -rf "$tmpdir"
> +
> + # Move hooks
> + mv hooks/nvidia-ko/* "$CRAFT_PART_INSTALL"
> +
> + organize:
> + bits/: (component/nvidia-550-ko)/bits
> + usr/bin: (component/nvidia-550-ko)/bin
> + usr/lib: (component/nvidia-550-ko)/lib
> +
> + install.nvidia-ko: (component/nvidia-550-ko)/snap/hooks/install
> + post-refresh.nvidia-ko: (component/nvidia-550-ko)/snap/hooks/post-refresh
> + remove.nvidia-ko: (component/nvidia-550-ko)/snap/hooks/remove
> +
> + nvidia-550-user-comp:
> + source: https://git.launchpad.net/~aaronjauregui/canonical-kernel-snaps
URL needs updating.
> + source-type: git
> + source-branch: main
> + plugin: nil
> +
> + after:
> + - nvidia-550-ko-comp
> +
> + override-build: |
> + craftctl default
> +
> + # Move hooks
> + mv hooks/nvidia-user/* "$CRAFT_PART_INSTALL"
> +
> + # Get NVIDIA userspace lib ver
> + nvidia_usr_ver="$(<$CRAFT_PART_BUILD/../../nvidia-550-ko-comp/build/nvidia_usr_ver)"
That looks brittle. Can you use a temporary path /tmp/nvidia_usr_ver?
> +
> + # Stage nvidia libs
> + apt_cache="$(mktemp -d)"
> + dpkg_status="$(mktemp)"
> +
> + apt_get_param="--download-only --assume-yes -o APT::Sandbox::User=root -o Dir::Cache=$apt_cache -o Dir::State::status=$dpkg_status"
> + stage_packages=""
> +
> + while read LINE
> + do
> + stage_packages="$stage_packages $LINE=$nvidia_usr_ver"
> + echo "$stage_packages"
> + done < "$CRAFT_PROJECT_DIR"/nvidia_packages
> +
> + # Fetch nvidia drivers
> + apt-get $apt_get_param install $stage_packages
> +
> + # Unpack nvidia drivers
> + for file in "$apt_cache"/archives/*.deb; do
> + dpkg-deb -x $file "$CRAFT_PART_INSTALL"
> + done
> +
> + # Cleanup
> + rm -rf $apt_cache
> + rm -rf $dpkg_status
Same here, should probably be a dedicated script. Have a look at chdist and
[3], it might do what you need.
There's lots of effort put into figuring out what packages and versions to
download. This feels too complicated. Would it help to make nvidia and/or
kernel packaging changes and provide additional meta packages?
What we want is a set of nvidia packages that match a kernel version?
> +
> + stage-packages:
> + - libnvidia-egl-wayland1
> +
> + organize:
> + usr/share: (component/nvidia-550-user)/usr/share
> + usr/lib: (component/nvidia-550-user)/usr/lib
> + usr/bin/nvidia-smi: (component/nvidia-550-user)/usr/bin/nvidia-smi
> + kernel-gpu-2404-provider-mangler: (component/nvidia-550-user)/kernel-gpu-2404-provider-mangler
> + install.nvidia-user: (component/nvidia-550-user)/snap/hooks/install
> + post-refresh.nvidia-user: (component/nvidia-550-user)/snap/hooks/post-refresh
> + remove.nvidia-user: (component/nvidia-550-user)/snap/hooks/remove
> +
> + override-stage: |
> + craftctl default
> + # Clean up leftover libs
> + rm -rf "$CRAFT_PART_INSTALL"/{_tmp,boot,lib,usr}
The code doesn't match the comment. Is override-stage really needed here?
> +
> + # Prune duplicate mesa libraries
> + nvidia-550-user-cleanup:
> + after: [nvidia-550-user-comp]
> + source: https://github.com/canonical/gpu-snap.git
> + plugin: dump
> + override-prime: |
> + craftctl default
> + CRAFT_PRIME="$CRAFT_COMPONENT_NVIDIA_550_USER_PRIME" \
> + "$CRAFT_PART_SRC"/bin/gpu-2404-cleanup mesa-2404
Ugg. Why do we need this?
> +
> +
> +
> +slots:
> + kernel-gpu-2404:
> + interface: content
> + read:
> + - $SNAP_COMMON/kernel-gpu-2404
...Juerg
[1] man depmod.d
[2] https://lore.kernel.org/lkml/Y5vvVTwt+FfxTUke@bergen.fjasle.eu/T/
[3] https://git.launchpad.net/~canonical-kernel-snaps/canonical-kernel-snaps/+git/kernel-snaps-u24.10/tree/stage-from-series
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250120/f41340d0/attachment.sig>
More information about the kernel-team
mailing list