Nurion guide
누리온 지침서뉴론 지침서활용정보MyKSC 지침서
  • Nurion guidelines
  • SYSTEM
    • System Specifications and Configuration
    • User Environment
    • User Programming Environment
    • Job Execution through Scheduler (PBS)
    • User Support
  • Software
    • ANSYS FLUENT
    • ANSYS CFX
    • Abaqus
    • NASTRAN
    • Gaussian16 LINDA
    • Gaussian16
  • APPENDIX
    • Main Keywords for Job Scripts
    • Conda
    • Singularity Container
    • Lustre stripe
    • Data Archiving
    • MVAPICH2 Performance Optimization Options
    • Deep Learning Framework Parallelization
    • List of Common Libraries
    • Desktop Virtualization (VDI)
    • Burst Buffer
    • Flat node
    • DTN (Data Transfer Node)
  • External Link
    • Nurion Guide(Kor)
    • Neuron Guide(Kor)
Powered by GitBook
On this page
  • A. How to Use Horovod in Tensorflow
  • 1. Importing and initializing Horovod in the main function for use with TensorFlow
  • 2. Setting the dataset in the main function for using Horovod
  • 3. Setting Horovod-related options in the optimizer, and broadcast and training steps in the main function
  • 4. Setting parallelism for inter-operation and intra-operation
  • 5. Checkpoint Settings for Rank 0
  • B. How to Use Multiple Nodes in Intel Caffe
  1. APPENDIX

Deep Learning Framework Parallelization

A. How to Use Horovod in Tensorflow

When utilizing CPUs across multiple nodes, Horovod can be integrated with TensorFlow to enable parallelization. By adding code for Horovod as shown in the example below, it can be integrated with TensorFlow. Both Tensorflow and the Keras API within Tensorflow can be integrated with Horovod. First, we will introduce how to integrate Horovod with Tensorflow.(Example: MNIST Dataset and LeNet-5 CNN structure)

※ For detailed instructions on using Horovod with TensorFlow, refer to the official Horovod guide (https://github.com/horovod/horovod#usage)

1. Importing and initializing Horovod in the main function for use with TensorFlow

import horovod.tensorflow as hvd
...
hvd.init()

※ horovod.tensorflow: The module required to integrate Horovod with TensorFlow

※ Initialize Horovod to enable its use.

2. Setting the dataset in the main function for using Horovod

(x_train, y_train), (x_test, y_test) = \
keras.datasets.mnist.load_data('MNIST-data-%d' % hvd.rank())

※ Set and create the dataset based on the Horovod rank to assign datasets to each task.

3. Setting Horovod-related options in the optimizer, and broadcast and training steps in the main function

opt = tf.train.AdamOptimizer(0.001 * hvd.size())
opt = hvd.DistributedOptimizer(opt)
global_step = tf.train.get_or_create_global_step()
train_op = opt.minimize(loss, global_step=global_step)
hooks = [hvd.BroadcastGlobalVariablesHook(0),
                tf.train.StopAtStepHook(last_step=20000 // hvd.size()), ... ]

※ Apply Horovod-related settings to the optimizer and use broadcasting to distribute these settings to each task.

※ Set the training steps for each task according to the number of Horovod tasks.

4. Setting parallelism for inter-operation and intra-operation

config = tf.ConfigProto()
config.intra_op_parallelism_threads = int(os.environ[‘OMP_NUM_THREADS’])
config.inter_op_parallelism_threads = 2

※ config.intra_op_parallelism_threads: This is used to set the number of threads for computational tasks, applying the OMP_NUM_THREADS value specified in the job script (in this example, OMP_NUM_THREADS is set to 32).

※ config.intra_op_parallelism_threads: This specifies the number of threads that execute TensorFlow operations concurrently. If set to 2, as in the example, two operations will run in parallel.

5. Checkpoint Settings for Rank 0

checkpoint_dir = './checkpoints' if hvd.rank() == 0 else None
...
with tf.train.MonitoredTrainingSession(checkpoint_dir=checkpoint_dir,
hooks=hooks,
config=config) as mon_sess:

※ Since checkpoint saving and loading should be performed by a single process, configure it on rank 0

B. How to Use Multiple Nodes in Intel Caffe

Multi-node parallelism in Caffe is not officially supported by Horovod, but parallel processing can be achieved using Intel Caffe, which has been optimized by Intel for KNL. In the case of Intel Caffe, all tasks required for parallel processing are integrated into the development process, allowing you to use deploy.prototxt, solver.prototxt, and train_val.prototxt files developed in standard Caffe without modification.

※ For detailed instructions on using Intel Caffe, refer to the official Intel Caffe guide. (https://github.com/intel/caffe/wiki/Multinode-guide)

If you need to perform parallel processing on Caffe code that has been modified by a deep learning developer, the corresponding parts of the Intel Caffe source code must be updated, compiled, and then executed.

  • Method for Performing Parallel Processing in Intel Caffe (Job Script Example)

#!/bin/sh
#PBS -N test
#PBS -V
#PBS -l select=4:ncpus=68:mpiprocs=1:ompthreads=68
#PBS -q normal
#PBS -l walltime=04:00:00
#PBS -A caffe

cd $PBS_O_WORKDIR

module purge
module load conda/intel_caffe_1.1.5
source /apps/applications/miniconda3/envs/intel_caffe/mlsl_2018.3.008/intel64/bin/mlslvars.sh

export KMP_AFFINITY=verbose,granularity=fine,compact=1
export KMP_BLOCKTIME=1
export KMP_SETTINGS=1

export OMP_NUM_THREADS=60
mpirun -PSM2 -prepend-rank caffe train \
--solver ./models/intel_optimized_models/multinode/alexnet_4nodes/solver.prototxt

# or

./scripts/run_intelcaffe.sh --hostfile $PBS_NODEFILE \
--caffe_bin /apps/applications/miniconda3/envs/intel_caffe/bin/caffe \
--solver models/intel_optimized_models/multinode/alexnet_4nodes/solver.prototxt \
--network opa --ppn 1 --num_omp_threads 60

exit 0

※ Network Option: Set to Intel Omni-Path Architecture (OPA)

※ PPN: Abbreviation for process per node, indicating the number of processes per node (default: 1)

※ It is possible to run MPI without using a script, executing it in the same manner as the standard Caffe process

Last updated on November 08, 2024.

PreviousMVAPICH2 Performance Optimization OptionsNextList of Common Libraries

Last updated 6 months ago