I'm setting up an EC2 g6f.xlarge instance to run a custom FFmpeg build with Vulkan support, but I'm running into trouble installing the NVIDIA driver. I followed the official guide for installing GRID drivers on Ubuntu, but when I execute the installer with `sudo /bin/sh ./NVIDIA-Linux-x86_64*.run`, I receive an error stating: "ERROR: Unable to load the kernel module 'nvidia-drm.ko'." This error can occur for several reasons, including mismatched kernel sources or conflicts with the nouveau driver. I checked the logs in '/var/log/nvidia-installer.log', and they show many warnings, including a kernel tainting message. I've ensured that the Linux headers match, blacklisted the nouveau driver, and also edited the grub file to include a rdblacklist for nouveau. I've even installed development tools like gcc and dkms. But still, I'm stuck. Any suggestions on how to resolve this?
3 Answers
Have you tried running the installer with `bash` instead of `sh`? Sometimes this small change can make a difference in how the script executes.
Instead of using the `.run` installer, you might want to use AWS's suggested method. Attach an IAM role for SSM, then run the SSM Automation `AWSEC2-InstallNvidiaGPU`. This method automatically installs the correct driver for your kernel. Also, make sure you have `linux-modules-extra-aws` installed and update initramfs before rebooting. This will help avoid driver mismatches.
It sounds like you might be missing the `linux-modules-extra` package. Try installing it with `sudo apt install linux-modules-extra-$(uname -r)`. Some users have reported success doing this alongside the regular installation steps in the guide. Good luck!
Man, you're saving lives here. How did you figure this out?

Same error here. I also tried installing `sudo apt install nvidia-driver-570`, but after rebooting, `nvidia-smi` still says it can't communicate with the driver.