GPU servers

There are 2 GPU servers available for the courses of the Depatment of Cybernetics:

cantor.felk.cvut.cz – 16 cores / 32 threads, 256GB RAM, 500GB SSD, 8 x NVIDIA GTX 1080Ti
taylor.felk.cvut.cz – 16 cores / 32 threads, 256GB RAM, 500GB SSD, 8 x NVIDIA GTX 1080Ti

The servers are running continuously and supports simultaneous work of several users, but each user should work only with 1 GPU card at same time.

How to login

Our GPU servers are available only via SSH protocol. From linux or Mac OSX, you can use command: ssh -X server_address. From MS windows, you can use e.g. MobaXterm, which has integrated Xwindow server.
You have to use the Faculty VPN for access from non CTU network.

Login and password are same as in the department classrooms E132,E220 and E230. It’s possible to set your password here: https://cw.felk.cvut.cz/password.
Warning: It’s strictly forbidden to run on the department servers any programs not related to an education on the Department of the Cybernetics..

Every user has available same home directory as in the classrooms. When needed, it’s possible to use local SSD storage: /local/temporary, where everybody can create own folder.

Work with GPUs

There are 8 GPUS in each server. It’s possible to work only with 1 GPU at same time. Do not use GPU used by anyone else. Which GPUs are in use tells commad:


eb@cantor>nvidia-smi

Wed Oct 10 10:36:42 2018

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  GeForce GTX 108...  On   | 00000000:04:00.0 Off |                  N/A |

| 31%   52C    P2    79W / 250W |  10942MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   1  GeForce GTX 108...  On   | 00000000:05:00.0 Off |                  N/A |

| 21%   29C    P8    17W / 250W |    982MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   2  GeForce GTX 108...  On   | 00000000:08:00.0 Off |                  N/A |

| 21%   28C    P8    16W / 250W |     10MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   3  GeForce GTX 108...  On   | 00000000:09:00.0 Off |                  N/A |

| 21%   29C    P8    18W / 250W |     10MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   4  GeForce GTX 108...  On   | 00000000:84:00.0 Off |                  N/A |

| 21%   28C    P8    17W / 250W |     10MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   5  GeForce GTX 108...  On   | 00000000:85:00.0 Off |                  N/A |

| 21%   28C    P8    17W / 250W |     10MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   6  GeForce GTX 108...  On   | 00000000:88:00.0 Off |                  N/A |

| 47%   70C    P2   173W / 250W |  10137MiB / 11178MiB |     75%      Default |

+-------------------------------+----------------------+----------------------+

|   7  GeForce GTX 108...  On   | 00000000:89:00.0 Off |                  N/A |

| 52%   75C    P2   157W / 250W |   6373MiB / 11178MiB |     43%      Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 8364 C python 10928MiB | | 1 10343 C /usr/local/matlab90/bin/glnxa64/MATLAB 243MiB | | 1 13517 C /opt/conda/envs/pytorch-py3.6/bin/python 729MiB | | 6 17529 C python 10125MiB | | 7 23438 C python3 6363MiB | +-----------------------------------------------------------------------------+
Almost in every GPU program it’s possible to specify which GPU should be used. GPU’s ID is shown on the left side of nvidia-smi output. PID shows number of proccess used this GPU. With command ps -axu, it’s possible to get username of the process owner.

There is environment variable CUDA_VISIBLE_DEVICES which tells to all CUDA based programs, which GPUs are available:

export CUDA_VISIBLE_DEVICES=3

Available software

Software on GPU servers is different than on workstations in the classrooms. Majority of the software is available as modules. When loading module, all required environment variables are set to required values. It’s possible to switche between different versions of the same software. You can work with modules by command module or in short ml.
module avail – lists all available modules
module load module_name/version – loads module module_name in the version version. If a version is ommited, the latest version will be loaded.
module list – lists all loaded modules.
module unload module_name – unloads modul module_name. All dependecies, but stays loaded.
module purge – unloads all loaded modules.

module examples:
eb@cantor:~$ gcc --version gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


eb@cantor:~$ ml GCC/7.1.0-2.28

eb@cantor:~$ gcc --version

gcc (GCC) 7.1.0

Copyright (C) 2017 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

eb@cantor:~$ module purge

eb@cantor:~$ module load Python

eb@cantor:~$ module list
Currently Loaded Modules:

  1) GCCcore/6.4.0                 6) numactl/2.0.11-GCCcore-6.4.0    11) FFTW/3.3.6-gompic-2017b                       16) ncurses/6.0-GCCcore-6.4.0      21) XZ/5.2.3-GCCcore-6.4.0

  2) binutils/2.28-GCCcore-6.4.0   7) hwloc/1.11.7-GCCcore-6.4.0      12) ScaLAPACK/2.0.2-gompic-2017b-OpenBLAS-0.2.20  17) libreadline/7.0-GCCcore-6.4.0  22) libffi/3.2.1-GCCcore-6.4.0

  3) GCC/6.4.0-2.28                8) OpenMPI/2.1.1-gcccuda-2017b     13) goolfc/2017b                                  18) Tcl/8.6.7-GCCcore-6.4.0        23) Python/3.6.4-goolfc-2017b

  4) CUDA/9.0.176-GCC-6.4.0-2.28   9) OpenBLAS/0.2.20-GCC-6.4.0-2.28  14) bzip2/1.0.6-GCCcore-6.4.0                     19) SQLite/3.20.1-GCCcore-6.4.0

  5) gcccuda/2017b                10) gompic/2017b                    15) zlib/1.2.11-GCCcore-6.4.0                     20) GMP/6.1.2-GCCcore-6.4.0
eb@cantor:~$ module unload Python

eb@cantor:~$ module list
Currently Loaded Modules:

  1) GCCcore/6.4.0                 6) numactl/2.0.11-GCCcore-6.4.0    11) FFTW/3.3.6-gompic-2017b                       16) ncurses/6.0-GCCcore-6.4.0      21) XZ/5.2.3-GCCcore-6.4.0

  2) binutils/2.28-GCCcore-6.4.0   7) hwloc/1.11.7-GCCcore-6.4.0      12) ScaLAPACK/2.0.2-gompic-2017b-OpenBLAS-0.2.20  17) libreadline/7.0-GCCcore-6.4.0  22) libffi/3.2.1-GCCcore-6.4.0

  3) GCC/6.4.0-2.28                8) OpenMPI/2.1.1-gcccuda-2017b     13) goolfc/2017b                                  18) Tcl/8.6.7-GCCcore-6.4.0

  4) CUDA/9.0.176-GCC-6.4.0-2.28   9) OpenBLAS/0.2.20-GCC-6.4.0-2.28  14) bzip2/1.0.6-GCCcore-6.4.0                     19) SQLite/3.20.1-GCCcore-6.4.0

  5) gcccuda/2017b                10) gompic/2017b                    15) zlib/1.2.11-GCCcore-6.4.0                     20) GMP/6.1.2-GCCcore-6.4.0

eb@cantor:~$ module purge eb@cantor:~$ module list No modules loaded

GPU modules

Modules with GPU support are listed with (g). Also GPU modules are with version fosscuda or goolfc. Some modules are available versions with and withou GPU support. Use GPU modules, if available, on the GPU servers. Available are GPU modules:

Caffe/1.0-fosscuda-2018b-Python-3.6.4 – https://github.com/BVLC/caffe
NVCaffe/0.17.0-fosscuda-2018b-Python-3.6.4 – https://github.com/NVIDIA/caffe
Pytorch/0.4.1-fosscuda-2018b-Python-3.6.4 – http://pytorch.org/
TensorFlow/1.10.0-fosscuda-2018b-Python-3.6.4 – https://www.tensorflow.org/
Keras/2.2.2-goolfc-2017b-Python-3.6.4 – https://keras.io/
Theano/1.0.2-fosscuda-2018b-Python-3.6.4 – http://deeplearning.net/software/theano

Containers

It’s possible to use software containers as alternative to modules. A container is packed software with all required libraries. Unlike modules, containers are portable. You can use Singularity on our GPU servers. It’s possible to work inside containers with files outside the container, e.g. in your home directory. You can use standard docker containers with singularity:



singularity run docker://godlovedc/lolcow

Docker image path: index.docker.io/godlovedc/lolcow:latest

Cache folder set to /home/user/.singularity/docker

[6/6] |===================================| 100.0%

Creating container runtime...

/ You've been leading a dog's life. Stay \

\ off the furniture.                     /

    \   ^__^
     \  (oo)\_______
        (__)\       )\/\
            ||----w |
            ||     ||

As you can see, that this command pulls docker container into ~/.singularity/docker folder and then container will start. Some containers are big (several GBs), it’s good idea to move this cache folder into local temporary storage:
mkdir /local/temporary/username mv ~/.singularity/docker /local/temporary/username/ ln -s /local/temporary/username/docker ~/.singularity/

You can use preprepared containers in the directory: /local/singularity_containers/ . Almost all of them are containers requires GPUs , so option –nv is needed.
eb@cantor:> singularity run --nv /local/singularity_images/caffe2-18.06-py3.simg


============
== Caffe2 ==
NVIDIA Release 18.06 (build 474750)
Container image Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.

All contributions by Facebook: Copyright (c) 2016, 2017 Facebook Inc.

All contributions by Google: Copyright (c) 2015 Google Inc.  All rights reserved.

All contributions by Yangqing Jia: Copyright (c) 2015 Yangqing Jia All rights reserved.

All contributions from Caffe: Copyright(c) 2013, 2014, 2015, the respective contributors. All rights reserved.

All other contributions: Copyright(c) 2015, 2016, 2017, the respective contributors. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.

NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

vecerka@boruvka:~$ nvidia-smi

Wed Oct 10 09:09:09 2018

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  GeForce GTX 108...  Off  | 00000000:04:00.0 Off |                  N/A |

| 31%   51C    P2    80W / 250W |  10942MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   1  GeForce GTX 108...  Off  | 00000000:05:00.0 Off |                  N/A |

| 21%   29C    P8    17W / 250W |    982MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   2  GeForce GTX 108...  Off  | 00000000:08:00.0 Off |                  N/A |

| 21%   28C    P8    16W / 250W |     10MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   3  GeForce GTX 108...  Off  | 00000000:09:00.0 Off |                  N/A |

| 21%   30C    P8    18W / 250W |     10MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   4  GeForce GTX 108...  Off  | 00000000:84:00.0 Off |                  N/A |

| 21%   29C    P8    17W / 250W |     10MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   5  GeForce GTX 108...  Off  | 00000000:85:00.0 Off |                  N/A |

| 21%   30C    P8    17W / 250W |     10MiB / 11178MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   6  GeForce GTX 108...  Off  | 00000000:88:00.0 Off |                  N/A |

| 48%   67C    P2   185W / 250W |  10137MiB / 11178MiB |     47%      Default |

+-------------------------------+----------------------+----------------------+

|   7  GeForce GTX 108...  Off  | 00000000:89:00.0 Off |                  N/A |

| 53%   75C    P2    82W / 250W |   5735MiB / 11178MiB |     26%      Default |

+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |

|=============================================================================|

|    0      8364      C   python                                     10928MiB |

|    1     10343      C   /usr/local/matlab90/bin/glnxa64/MATLAB       243MiB |

|    1     13517      C   /opt/conda/envs/pytorch-py3.6/bin/python     729MiB |

|    6     17529      C   python                                     10125MiB |

|    7     10439      C   python3                                     5725MiB |

+-----------------------------------------------------------------------------+