Skip to main content

GPU Compatibility - Operating Systems, Drivers, and CUDA

The GPU type you choose for your workload dictates the necessary Nvidia driver and CUDA Toolkit version for your virtual machine. Nvidia drivers serve as intermediaries between the operating system (OS) and Nvidia GPUs, and compatibility varies for each GPU model. The GPU's driver must be supported by the operating system running on the virtual machine. CUDA provides developers with tools to leverage the computational capabilities of Nvidia GPUs for various tasks, with the necessary CUDA version contingent on the specific GPU chosen. Explore the compatibility of drivers, CUDA versions, and OS for each GPU type offered by NexGen Cloud.

In this article


OS, driver, and CUDA version compatibility by GPU

The following table outlines the operating system, CUDA version, and Nvidia driver compatibility for the GPUs offered by NexGen Cloud.

GPUOperating systemCUDA versionNvidia drivers
H100Ubuntu: 20.04/22.04
Windows: 10/11 Pro, Server 2019/2020
*****11.8 or later*Linux: R535 or later
Windows: R535 or later
A100Ubuntu: 20.04/22.04
Windows: 10/11 Pro, Server 2019/2020
11.x or laterLinux: R535 or later
Windows: R535 or later
RTX
(A-4000, A-5000, A-6000, 6000-ada)
Ubuntu: 20.04/22.04
Windows: 10/11 Pro, Server 2019/2020
11.x or laterLinux: R535 or later
Windows: R535 U8 or later
L40Ubuntu: 20.04/22.04
Windows: 10/11 Pro, Server 2019/2020
11.x or laterLinux: R525 or later
Windows: R525 or later
note

*****For Linux-based virtual machines using the H100 GPU, CUDA versions 11.8 and later are compatible, however, we recommend version 12.2 for optimal performance. Additionally, while NVIDIA drivers R535 or later support the H100 GPU, only R535 provides production-stable performance. Explore further details on H100 GPU optimization and the recommended driver, CUDA, and kernel versions below.


Optimizing H100 GPU performance: driver, CUDA, and kernel

Last edited February 6, 2024

When optimized, the H100 GPU can deliver unparalleled performance and scalability for your workloads, especially for training and high-performance computing applications. However, optimizing H100 GPU virtual machines can be challenging and time-consuming due to the configuration of the NVIDIA driver, CUDA version, and Ubuntu kernel used for the VM, all of which significantly impact performance based on our testing.

In our testing, we've noticed that using either too new or too old versions of the Ubuntu kernel can lead to performance issues. These issues include the H100 GPU not being recognized or the system running slowly due to a lack of optimization. For example, in one test case, we observed that a virtual machine running Ubuntu 22.04 with PyTorch and the Hugging Face framework took 21 seconds to execute using the optimal kernel version (6.5.0-15-generic). However, when using a slightly older kernel version (6.5.0-14), the same task took 21 minutes. This prolonged runtime was caused by GPU timeouts and the VM's inability to detect the GPU.

We've discovered that achieving optimal performance with the H100 GPU requires a specific combination of the Ubuntu kernel, NVIDIA driver, and CUDA version. Below are the details of the optimal configurations for Ubuntu versions 22.04 and 20.04.

Table 1. Production stable NVIDIA driver, Ubuntu kernel, and CUDA configurations for the H100 GPU.

Ubuntu distributionKernel versionNVIDIA driverCUDA version
22.046.5.0-15-generic53512.2
20.045.15.0-92-generic53512.2

Based on our testing, we recommend using kernel version 6.5.0-15-generic for Ubuntu 22.04 and 5.15.0-92-generic for 20.04 for optimal and production-stable performance. Additionally, for both distributions, use NVIDIA driver 535 and CUDA 12.2.

Creating a production-ready, stable, and fast H100 virtual machine can be challenging and time-consuming due to testing and troubleshooting. To streamline this process, we have developed VM images pre-installed with optimized configurations of NVIDIA drivers, CUDA, and kernel for both Ubuntu distributions 22.04 and 20.04. Explore more about our pre-installed images here.

Please note that when utilizing the CUDA version provided in our pre-installed images, your applications must be executed directly within the OS. If your code runs within a container, it will utilize the CUDA version from the container rather than the OS. It's important to ensure compatibility with the H100 GPU, as it requires CUDA 11.8 or newer, and for optimal performance use CUDA 12.2. If containers are part of your workflow, consider using a newer container version equipped with CUDA 12.2.


Nvidia GPU driver best practices

When installing Nvidia drivers on the first boot, it is recommended to follow the best practices of the operating system (OS) that you are using. Here are some general guidelines to keep in mind:

  1. Always use the latest Nvidia driver version compatible with your VM's operating system. Check the installation instructions by selecting your OS and architecture from https://developer.nvidia.com/cuda-downloads.

  2. Automate the driver installation process to minimize user effort and maintain consistency across VMs.

  3. Before deployment, test the driver installation process in a sandbox environment to identify any potential issues.

  4. Include error checking and logging in the installation process to help diagnose and resolve issues.

  5. Consider pre-installing any dependencies required for the driver installation, such as build tools and kernel headers.

  6. Ensure you have obtained any required licenses for the Nvidia drivers before including them in the VM image.

For more information on configuring virtual machines: See the Initialization Configuration documentation.

Automatically install Nvidia drivers on VM initialization

To automatically install Nvidia drivers on Linux-based virtual machines:

Include the following cloud-init script within the user_data field of the request body when creating a new virtual machine using the Infrahub API.

Generic Nvidia driver installation script
#cloud-config
write_files:
- path: /tmp/download_and_run_script_nvidia.sh
permissions: '0755'
content: |
#!/bin/bash
set -e

# Download the script
echo "Downloading the init script..."
SCRIPT_URL="https://api.nexgencloud.com:8080/public/nvidia/installer_script_main.sh"
SCRIPT_NAME=$(basename "\${SCRIPT_URL}")

# Use wget if available, otherwise use curl
if command -v wget >/dev/null 2>&1; then
wget -O "/tmp/\${SCRIPT_NAME}" "\${SCRIPT_URL}"
else
curl -o "/tmp/\${SCRIPT_NAME}" "\${SCRIPT_URL}"
fi

# Make the downloaded script executable
chmod +x "/tmp/\${SCRIPT_NAME}"
echo "Running the init script..."
/tmp/\${SCRIPT_NAME}

runcmd:
- /tmp/download_and_run_script_nvidia.sh
- [ shutdown, -r, now ]

CUDA installation by operating system

Presented below are cloud-init scripts for Nvidia driver installation on Linux-based operating systems, along with the driver installer for Windows OSs.

Linux

Ubuntu 20.04

To install Nvidia drivers on Ubuntu 20.04, use the following cloud-init script:

Ubuntu 20.04
#cloud-config
runcmd:
- 'wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb'
- 'sudo dpkg -i cuda-keyring_1.0-1_all.deb'
- 'sudo apt-get update'
- 'sudo apt-get -y install cuda'
Ubuntu 22.04

To install Nvidia drivers on Ubuntu 22.04, use the following cloud-init script:

Ubuntu 22.04
#cloud-config
runcmd:
- 'wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb'
- 'sudo dpkg -i cuda-keyring_1.0-1_all.deb'
- 'sudo apt-get update'
- 'sudo apt-get -y install cuda'

Windows

Windows 10, 11, 2019 Server, 2022 Server

To install Nvidia drivers on Windows 10, 11, 2019 Server, or 2022 Server, follow these installation instructions:

  1. Download cuda_12.3.1_windows_network.exe

  2. Follow on-screen prompts.

Additional installation options are detailed here.

info

We recommend rebooting the virtual machine once the driver is installed.


Back to top