Horovod adopts a standard message passing interface (MPI) model to send messages and manage the communication between nodes in a high-performance distributed computing environment. The Horovod MPI implementation provides a more simplified programming model than the standard TensorFlow distributed training model. To train a training model using multiple nodes based on the conda environment in the Neuron system, Horovod and required libraries can be installed, according to the following method, and the model is subsequently executed.
- For information on how to use Horovod, refer to [Appendix 8].
A. Tensorflow-horovod
1. Create a conda environment
Copy $ module load gcc/10.2.0 cuda/11.4 cudampi/openmpi-4.1.1 python/3.7.1 cmake/3.16.9
$ conda create -n my_tensorflow
$ source activate my_tensorflow
(my_tensorflow) $
For detailed information on how to use conda, refer to [Appendix 5].
2. Install Tensorflow and Horovod
Copy (my_tensorflow) $ conda install tensorflow-gpu=2.0.0 tensorboard=2.0.0 tensorflow-estimator=2.0.0 python=3.7 cudnn cudatoolkit=10 nccl=2.8.3
(my_tensorflow) $ HOROVOD_WITH_MPI=1 HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_NCCL_LINK=SHARED HOROVOD_WITH_TENSORFLOW=1 \
pip install --no-cache-dir horovod==0.23.0
3. Check
Copy (my_tensorflow) $ pip list | grep horovod
horovod 0.23.0
(my_tensorflow) $ python
>>> import horovod
>>> horovod.__version__
'0.23.0'
4. Example
1) interactive job
Copy $ salloc --partition=cas_v100_4 -J debug --nodes=2 --ntasks-per-node=2 --time=08:00:00 --gres=gpu:2 --comment=tensorflow
$ echo $SLURM_NODELIST
gpu[12-13]
$ module load gcc/10.2.0 cuda/11.4 cudampi/openmpi-4.1.1 python/3.7.1
$ source activate my_tensorflow
(my_tensorflow) $ horovodrun -np 4 -H gpu12:2,gpu13:2 python tensorflow2_mnist.py
2) batch job script
Copy #!/bin/bash
#SBATCH -J test_job
#SBATCH -p cas_v100_4
#SBATCH -N 2
#SBATCH --ntasks-per-node=2
#SBATCH -o %x.o%j
#SBATCH -e %x.e%j
#SBATCH --time 00:30:00
#SBATCH --gres=gpu:2
#SBATCH --comment tensorflow
module purge
module load gcc/10.2.0 cuda/11.4 cudampi/openmpi-4.1.1 python/3.7.1
source activate my_tensorflow
horovodrun -np 2 python tensorflow2_mnist.py
B. Pytorch-horovod
1. Create a conda environment
Copy $ module load gcc/10.2.0 cuda/11.4 cudampi/openmpi-4.1.1 python/3.7.1 cmake/3.16.9
$ conda create -n my_pytorch
$ source activate my_pytorch
(my_pytorch) $
2. Pytorch & horovod installation
Copy (my_pytorch) $ conda install pytorch=1.11.0 python=3.9 torchvision=0.12.0 torchaudio=0.11.0 cudatoolkit=10.2 -c pytorch
(my_pytorch) $ HOROVOD_WITH_MPI=1 HOROVOD_NCCL_LINK=SHARED HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_WITH_PYTORCH=1 \
pip install --no-cache-dir horovod==0.24.0(my_pytorch) $
3. Check
Copy (my_pytorch) $ pip list | grep horovod
horovod 0.24.0
(my_pytorch) $ python
>>> import horovod
>>> horovod.__version__
'0.24.0'
4. example
1) Interactive job
Copy $ salloc --partition=cas_v100_4 -J debug --nodes=2 --ntasks-per-node=2 --time=08:00:00 --gres=gpu:2 --comment=pytorch
$ echo $SLURM_NODELIST
gpu[22-23]
$ module load gcc/10.2.0 cuda/11.4 cudampi/openmpi-4.1.1 python/3.7.1
$ source activate my_pytorch
(my_pytorch) $ horovodrun -np 4 -H gpu22:2,gpu23:2 python pytorch_ex.py
2) batch job script
Copy #!/bin/bash
#SBATCH -J test_job
#SBATCH -p cas_v100_4
#SBATCH -N 2
#SBATCH --ntasks-per-node=2
#SBATCH -o %x.o%j
#SBATCH -e %x.e%j
#SBATCH --time 00:30:00
#SBATCH --gres=gpu:2
#SBATCH --comment pytorch
module purge
module load gcc/10.2.0 cuda/11.4 cudampi/openmpi-4.1.1 python/3.7.1
source activate my_pytorch
horovodrun -np 2 python pytorch_ex.py
2021년 12월 2일에 마지막으로 업데이트되었습니다.
Last updated 11 months ago