Carlos Aguni

Highly motivated self-taught IT analyst. Always learning and ready to explore new skills. An eternal apprentice.


AWS CUDA

24 Aug 2022 »

https://aws.amazon.com/ec2/instance-types/p2/

Caveats

X Error of failed request: GLXBadContextTag

X Error of failed request:  GLXBadContextTag
  Major opcode of failed request:  146 (GLX)
  Minor opcode of failed request:  26 (X_GLXMakeContextCurrent)
  Serial number of failed request:  41
  Current serial number in output stream:  41

libGL.so not found, refer to CUDA

https://www.linuxfixes.com/2022/04/solved-cuda-missing-libglso-libgluso.html

findgllib.mk

ifeq ("$(UBUNTU)","0")
  ifeq ...
  ...
  else
    GLPATH    ?= /usr/lib/$(UBUNTU_PKG_NAME)
    GLLINK    ?= -L/usr/lib/$(UBUNTU_PKG_NAME)
    DFLT_PATH ?= /usr/lib

couldn’t communicate with the NVIDIA driver

https://stackoverflow.com/questions/55261785/nvidia-drivers-stopped-working-on-aws-ec2-instance-with-ubuntu-16-04-and-tesla-k

wget http://us.download.nvidia.com/tesla/384.183/NVIDIA-Linux-x86_64-384.183.run
sudo sh ./NVIDIA-Linux-x86_64-384.183.run --no-drm --disable-nouveau --dkms --silent --install-libglvnd
nvidia-smi

https://forums.fast.ai/t/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver/41557

cuda_install.sh

#!/bin/bash
set -x
version=$1
wget http://us.download.nvidia.com/tesla/${version}/NVIDIA-Linux-x86_64-${version}.run 
sudo sh ./NVIDIA-Linux-x86_64-${version}.run --no-drm --disable-nouveau --dkms --silent --install-libglvnd 
sudo ./driver_install.sh 410.104
sudo modprobe nvidia
nvidia-smi

https://www.nvidia.com/Download/driverResults.aspx/190724/en-us/