GPU servers

There are 2 GPU servers available for the courses of the Depatment of Cybernetics:

  • cantor.felk.cvut.cz – 16 cores / 32 threads, 256GB RAM, 500GB SSD, 8 x NVIDIA GTX 1080Ti
  • taylor.felk.cvut.cz – 16 cores / 32 threads, 256GB RAM, 500GB SSD, 8 x NVIDIA GTX 1080Ti

The servers are running continuously and supports simultaneous work of several users, but each user should work only with 1 GPU card at same time.

How to login


Our GPU servers are available only via SSH protocol. From linux or Mac OSX, you can use command: ssh -X server_address. From MS windows, you can use e.g. MobaXterm, which has integrated Xwindow server.

Login and password are same as in the department classrooms E132,E220 and E230. It’s possible to set your password here: https://cw.felk.cvut.cz/password.
Warning: It’s strictly forbidden to run on the department servers any programs not related to an education on the Department of the Cybernetics..

Every user has available same home directory as in the classrooms. When needed, it’s possible to use local SSD storage: /local/temporary, where everybody can create own folder.

Work with GPUs

There are 8 GPUS in each server. It’s possible to work only with 1 GPU at same time. Do not use GPU used by anyone else. Which GPUs are in use tells commad:

eb@cantor>nvidia-smi
Wed Oct 10 10:36:42 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37 Driver Version: 396.37 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:04:00.0 Off | N/A |
| 31% 52C P2 79W / 250W | 10942MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... On | 00000000:05:00.0 Off | N/A |
| 21% 29C P8 17W / 250W | 982MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... On | 00000000:08:00.0 Off | N/A |
| 21% 28C P8 16W / 250W | 10MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... On | 00000000:09:00.0 Off | N/A |
| 21% 29C P8 18W / 250W | 10MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 108... On | 00000000:84:00.0 Off | N/A |
| 21% 28C P8 17W / 250W | 10MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 108... On | 00000000:85:00.0 Off | N/A |
| 21% 28C P8 17W / 250W | 10MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX 108... On | 00000000:88:00.0 Off | N/A |
| 47% 70C P2 173W / 250W | 10137MiB / 11178MiB | 75% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX 108... On | 00000000:89:00.0 Off | N/A |
| 52% 75C P2 157W / 250W | 6373MiB / 11178MiB | 43% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 8364 C python 10928MiB |
| 1 10343 C /usr/local/matlab90/bin/glnxa64/MATLAB 243MiB |
| 1 13517 C /opt/conda/envs/pytorch-py3.6/bin/python 729MiB |
| 6 17529 C python 10125MiB |
| 7 23438 C python3 6363MiB |
+-----------------------------------------------------------------------------+

Almost in every GPU program it’s possible to specify which GPU should be used. GPU’s ID is shown on the left side of nvidia-smi output. PID shows number of proccess used this GPU. With command ps -axu, it’s possible to get username of the process owner.

There is environment variable CUDA_VISIBLE_DEVICES which tells to all CUDA based programs, which GPUs are available:

export CUDA_VISIBLE_DEVICES=3

Available software

Software on GPU servers is different than on workstations in the classrooms. Majority of the software is available as modules. When loading module, all required environment variables are set to required values. It’s possible to switche between different versions of the same software. You can work with modules by command module or in short ml.
module avail – lists all available modules
module load module_name/version – loads module module_name in the version version. If a version is ommited, the latest version will be loaded.
module list – lists all loaded modules.
module unload module_name – unloads modul module_name. All dependecies, but stays loaded.
module purge – unloads all loaded modules.

module examples:

eb@cantor:~$ gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

eb@cantor:~$ ml GCC/7.1.0-2.28
eb@cantor:~$ gcc --version
gcc (GCC) 7.1.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
eb@cantor:~$ module purge
eb@cantor:~$ module load Python
eb@cantor:~$ module list

Currently Loaded Modules:
1) GCCcore/6.4.0 6) numactl/2.0.11-GCCcore-6.4.0 11) FFTW/3.3.6-gompic-2017b 16) ncurses/6.0-GCCcore-6.4.0 21) XZ/5.2.3-GCCcore-6.4.0
2) binutils/2.28-GCCcore-6.4.0 7) hwloc/1.11.7-GCCcore-6.4.0 12) ScaLAPACK/2.0.2-gompic-2017b-OpenBLAS-0.2.20 17) libreadline/7.0-GCCcore-6.4.0 22) libffi/3.2.1-GCCcore-6.4.0
3) GCC/6.4.0-2.28 8) OpenMPI/2.1.1-gcccuda-2017b 13) goolfc/2017b 18) Tcl/8.6.7-GCCcore-6.4.0 23) Python/3.6.4-goolfc-2017b
4) CUDA/9.0.176-GCC-6.4.0-2.28 9) OpenBLAS/0.2.20-GCC-6.4.0-2.28 14) bzip2/1.0.6-GCCcore-6.4.0 19) SQLite/3.20.1-GCCcore-6.4.0
5) gcccuda/2017b 10) gompic/2017b 15) zlib/1.2.11-GCCcore-6.4.0 20) GMP/6.1.2-GCCcore-6.4.0

eb@cantor:~$ module unload Python
eb@cantor:~$ module list

Currently Loaded Modules:
1) GCCcore/6.4.0 6) numactl/2.0.11-GCCcore-6.4.0 11) FFTW/3.3.6-gompic-2017b 16) ncurses/6.0-GCCcore-6.4.0 21) XZ/5.2.3-GCCcore-6.4.0
2) binutils/2.28-GCCcore-6.4.0 7) hwloc/1.11.7-GCCcore-6.4.0 12) ScaLAPACK/2.0.2-gompic-2017b-OpenBLAS-0.2.20 17) libreadline/7.0-GCCcore-6.4.0 22) libffi/3.2.1-GCCcore-6.4.0
3) GCC/6.4.0-2.28 8) OpenMPI/2.1.1-gcccuda-2017b 13) goolfc/2017b 18) Tcl/8.6.7-GCCcore-6.4.0
4) CUDA/9.0.176-GCC-6.4.0-2.28 9) OpenBLAS/0.2.20-GCC-6.4.0-2.28 14) bzip2/1.0.6-GCCcore-6.4.0 19) SQLite/3.20.1-GCCcore-6.4.0
5) gcccuda/2017b 10) gompic/2017b 15) zlib/1.2.11-GCCcore-6.4.0 20) GMP/6.1.2-GCCcore-6.4.0

eb@cantor:~$ module purge
eb@cantor:~$ module list
No modules loaded

GPU modules


Modules with GPU support are listed with (g). Also GPU modules are with version fosscuda or goolfc. Some modules are available versions with and withou GPU support. Use GPU modules, if available, on the GPU servers. Available are GPU modules:

Containers


It’s possible to use software containers as alternative to modules. A container is packed software with all required libraries. Unlike modules, containers are portable. You can use Singularity on our GPU servers. It’s possible to work inside containers with files outside the container, e.g. in your home directory. You can use standard docker containers with singularity:

singularity run docker://godlovedc/lolcow
Docker image path: index.docker.io/godlovedc/lolcow:latest
Cache folder set to /home/user/.singularity/docker
[6/6] |===================================| 100.0%
Creating container runtime...

________________________________________
/ You've been leading a dog's life. Stay \
\ off the furniture. /
----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

As you can see, that this command pulls docker container into ~/.singularity/docker folder and then container will start. Some containers are big (several GBs), it’s good idea to move this cache folder into local temporary storage:

mkdir /local/temporary/username
mv ~/.singularity/docker /local/temporary/username/
ln -s /local/temporary/username/docker ~/.singularity/

You can use preprepared containers in the directory: /local/singularity_containers/ . Almost all of them are containers requires GPUs , so option –nv is needed.

eb@cantor:> singularity run --nv /local/singularity_images/caffe2-18.06-py3.simg

============
== Caffe2 ==
============

NVIDIA Release 18.06 (build 474750)

Container image Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
All contributions by Facebook: Copyright (c) 2016, 2017 Facebook Inc.
All contributions by Google: Copyright (c) 2015 Google Inc. All rights reserved.
All contributions by Yangqing Jia: Copyright (c) 2015 Yangqing Jia All rights reserved.
All contributions from Caffe: Copyright(c) 2013, 2014, 2015, the respective contributors. All rights reserved.
All other contributions: Copyright(c) 2015, 2016, 2017, the respective contributors. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
vecerka@boruvka:~$ nvidia-smi
Wed Oct 10 09:09:09 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37 Driver Version: 396.37 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A |
| 31% 51C P2 80W / 250W | 10942MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A |
| 21% 29C P8 17W / 250W | 982MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:08:00.0 Off | N/A |
| 21% 28C P8 16W / 250W | 10MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A |
| 21% 30C P8 18W / 250W | 10MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 108... Off | 00000000:84:00.0 Off | N/A |
| 21% 29C P8 17W / 250W | 10MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 108... Off | 00000000:85:00.0 Off | N/A |
| 21% 30C P8 17W / 250W | 10MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX 108... Off | 00000000:88:00.0 Off | N/A |
| 48% 67C P2 185W / 250W | 10137MiB / 11178MiB | 47% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX 108... Off | 00000000:89:00.0 Off | N/A |
| 53% 75C P2 82W / 250W | 5735MiB / 11178MiB | 26% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 8364 C python 10928MiB |
| 1 10343 C /usr/local/matlab90/bin/glnxa64/MATLAB 243MiB |
| 1 13517 C /opt/conda/envs/pytorch-py3.6/bin/python 729MiB |
| 6 17529 C python 10125MiB |
| 7 10439 C python3 5725MiB |
+-----------------------------------------------------------------------------+

Responsible person: Petr Pošík