CUDA, cuDNN and NCCL for Anaconda Python

August 13, 2019

Access GPU CUDA, cuDNN and NCCL functionality are accessed in a Numpy-like way from CuPy. CuPy also allows use of the GPU in a more low-level fashion as well.

Before starting GPU work in any programming language realize these general caveats:

I/O heavy workloads may make realizing GPU benefits more difficult
Consumer GPUs (GeForce) can be > 10x slower than workstation class (Tesla, Quadro)

CUDA requires a discrete Nvidia GPU. Check for existence of an Nvidia GPU by:

Linux: a blank response means an Nvidia GPU is not detected.
```
lspci | grep -i nvidia
```
Windows: Look under the “render” tab to see if an Nvidia GPU exists.
```
dxdiag
```

Determine the Compute Capability of the GPU and install the correct CUDA Toolkit. CuPy is installed distinctly depending on the CUDA Toolkit version installed on your computer. Reboot.

CuPy syntax is very similar to Numpy. There are a large set of CuPy functions relevant to many engineering and scientific computing tasks.

import cupy

dev = cupy.cuda.Device()
print('Compute Capability', dev.compute_capability)
print('GPU Memory', dev.mem_info)

The should return like:

Compute Capability 75

If you get error like

cupy.cuda.runtime.CUDARuntimeError: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version

This means the CUDA Toolkit version is expecting a newer Nvidia driver. The Nvidia driver can be updated via your standard Nvidia update program that was installed from the factory. “Table 1” of the CUDA Toolkit release notes gives the CUDA Toolkit required Driver Versions.

Examples:

Python PyCUDA Matrix Multiplication Benchmark matmul_cuda.py
non-Python Graphics Benchmarks

Alternatives to CuPy include Numba.cuda, which is a lower-level C-like CUDA interface from Python. CUDA for Julia is provided in JuliaGPU. Anaconda Accelerate was discontinued