Parallelizing Deep Learning Frameworks with Horovod
A. How to Use Horovod in TensorFlow
❍ When utilizing multiple GPUs across multiple nodes, Horovod can be integrated with TensorFlow for parallelization. By adding code for Horovod as shown in the example below, it can be integrated with TensorFlow. Both TensorFlow and the Keras API within TensorFlow can be integrated with Horovod. First, we will introduce how to integrate Horovod with TensorFlow.(Example: MNIST Dataset and LeNet-5 CNN structure)
※ For detailed instructions on using Horovod with TensorFlow, refer to the Horovod official guide (https://github.com/horovod/horovod#usage).
Import Horovod and initialize Horovod in the main function for use with TensorFlow.
※ horovod.tensorflow: Module for integrating Horovod with TensorFlow
※ Initialize Horovod to enable its use.
Set the dataset to use Horovod in the main function
※ Set and create the dataset based on the Horovod rank to assign datasets to each task.
Apply Horovod-related settings to the optimizer in the main function, and configure broadcasting and the number of training steps.
※ Apply Horovod-related settings to the optimizer and use broadcasting to distribute these settings to each task.
※ Set the training steps for each task according to the number of Horovod tasks.
Assign GPU devices based on the Horovod process rank
※ Allocate one task per GPU according to Horovod's local rank
Set the checkpoint on the Rank 0 task
※ Since checkpoint saving and loading should be performed by a single process, configure it on rank 0
B. Using Horovod with Keras
In TensorFlow, Horovod can be integrated with the Keras API for parallelization. By adding the code for Horovod as shown in the example below, it can be integrated with Keras. (Example: MNIST Dataset and LeNet-5 CNN structure)
※ For detailed instructions on using Horovod with Keras, refer to the Horovod official guide. (https://github.com/horovod/horovod/blob/master/docs/keras.rst)
Import Horovod and initialize it in the main function for use with Keras.
※ horovod.tensorflow.keras: Module for integrating Horovod with Keras in TensorFlow
※ Initialize Horovod to enable its use.
Assign GPU devices based on the Horovod process rank
※ Allocate a single job for each GPU according to the Horovod local rank.
Apply Horovod-related settings to the optimizer in the main function, and configure broadcasting and the number of training steps.
※ Set the training steps for each task according to the number of Horovod tasks.
※ Apply Horovod-related settings to the optimizer and use broadcasting to distribute these settings to each task.
Set the checkpoint on the Rank 0 task
※ Since checkpoint saving and loading should be performed by a single process, configure it on rank 0
Assign GPU devices based on the Horovod process rank
※ To ensure that training output is displayed only by the Rank 0 task, set the verbose value to 1 only for the Rank 0 task.
C. Using Horovod with PyTorch
When utilizing multiple GPUs across multiple nodes, Horovod can be integrated with PyTorch for parallelization. By adding the code for Horovod as shown in the example below, it can be integrated with PyTorch. (Example: MNIST Dataset and LeNet-5 CNN structure)
※ For detailed instructions on using Horovod with PyTorch, refer to the Horovod official guide. (https://github.com/horovod/horovod/blob/master/docs/pytorch.rst)
Import Horovod and initialize it in the main function for use with PyTorch, then configure the settings.
※ torch.utils.data.distributed: Module for performing distributed training in PyTorch
※ horovod.torch: Module for integrating Horovod with PyTorch
※ Initialize Horovod and configure the device to be used according to the rank set during initialization.
※ Use torch.set_num_threads(1) to assign one CPU thread per task.
Add Horovod-related content to the training process.
※ train_sampler.set_epoch(epoch): Sets the epoch for the train sampler.
※ Since the training dataset is divided among multiple tasks, use len(train_sampler) to check the total dataset size.
Calculate the average value using Horovod.
※ To calculate the average across multiple nodes, use Horovod's Allreduce communication.
❍ Add Horovod-related content to the testing process.
※ Since the average needs to be calculated across multiple nodes, use the metric_average function declared above.
※ After performing Allreduce communication across nodes, each node has the same calculated values for loss and accuracy, so the print function is executed on rank 0.
Set the dataset in the main function for use with Horovod
※ Set and create the dataset based on the Horovod rank to assign datasets to each task.
※ Set up PyTorch's distributed sampler and assign it to the data loader.
Apply Horovod-related settings to the optimizer in the main function and add the sampler to the training and testing processes.
※ Apply Horovod-related settings to the optimizer and use broadcasting to distribute these settings to each task.
※ Add the sampler to the training and testing processes and pass it to each function.
Last updated on November 11, 2024.
Last updated