SCIAMA
High Performance Compute Cluster
The GPU Anaconda Environment
This document explains the process behind creating the GPU conda environment for use on nodes gpu01-02. The order is important: cuda first, then conda, then pip. It goes through the installation process of Pytorch, Tensorflow, and JAX. These can be installed independently: you do not need all three in the same environment if you don't need them all.
Environment Creation
Steps:
- Log onto a login node and run this to get to a gpu node:
srun --pty -n 32 --gres=gpu:1 -J interactive -p gpu.q /bin/bash - Load the latest anaconda module using:
module load anaconda3/2022.10 - Create a new conda environment with some initial packages and cuda using the command below. The -n flag sets the env name, which can be anything. Here we have used 'gpuname'
conda create -n gpuname python=3.10 scikit-learn scikit-image pandas matplotlib numpy cuda cudnn -c nvidia - Activate the environment using this command:
conda activate gpuname
Installing Pytorch
Instructions can also be found at www.pytorch.org.
It will update the cudatoolkit and cudnn packages.
Test that Pytorch can see the GPU with these python commands:
It should return the name of the gpu:
Installing Tensorflow
Instructions can also be found at www.tensorflow.org/install/pip. Tensorflow needs an environment variable set to see cuda. Tensorflow recommends editing the LD_LIBRARY_PATH, but that breaks many other things (e.g. we can’t use the ls command). Instead, we set the LD_PRELOAD with specific shared object files.
This one also needs to be set:
This can be set by conda whenever the environment is activated by running these two commands:
echo 'export LD_PRELOAD=$CONDA_PREFIX/lib/libcudart.so:$CONDA_PREFIX/lib/libcublas.so:$CONDA_PREFIX/lib/libcublasLt.so:$CONDA_PREFIX/lib/libcufft.so:$CONDA_PREFIX/lib/libcurand.so:$CONDA_PREFIX/lib/libcusolver.so:$CONDA_PREFIX/lib/libcusparse.so:$CONDA_PREFIX/lib/libcudnn.so' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
Then tensorflow can be installed using this command:
Test that tensorflow can see the CPU and GPU by running these commands:
CPU:
Success:
GPU:
Success:
Installing JAX
To install JAX with cuda support, run this command:
Test that jax defaults to using the gpu by running these python commands:
Success:
If it still says cpu, you might need to swap out the jaxlib for the cuda version, you can do this by running this command:
This is the pip command to install all of jax, but it’s already installed, so what pip should do in this instance is: see that everything is already installed except the cuda version of jaxlib and swap that out.