Singularity Container
Singularity is a container platform suitable for HPC environments, similar to Docker, designed to implement OS virtualization. You can build a container image that includes a Linux distribution, compiler, library, and applications suited to your work environment, and run the built container image to execute your programs.
Deep learning frameworks like TensorFlow, Caffe, and PyTorch, as well as Quantum Espresso,
LAMMPS, GROMACS, and ParaView, are supported by the pre-built container images.
They can be accessed in the /apps/applications/singularity_images/ngc directory.


※ Virtual machines have a structure where applications run through a hypervisor and guest OS, whereas containers are closer to the physical hardware and share the host OS rather than using a separate guest OS, resulting in lower overhead. The use of containers has been increasing in recent cloud services.
(Video guide) How to Build and Run Singularity Container Images : https://youtu.be/5PYAE0fuvBk
A. Building a Container Image
1. Load the Singularity Module or Set the Path
2. local build
To build a container image locally on the Neuron system's login node, you must first apply for fakeroot usage by submitting a request through KISTI website > Technical Support > Consultation Request with the following details.
System Name : Neuron
User ID : a000abc
Request : Singularity fakeroot usage setting
From the Docker container distributed by NGC (Nvidia GPU Cloud), you can build Singularity container images optimized for Nvidia GPUs on the Neuron system, including deep learning frameworks and HPC applications.
[image build command]
[example]
3. Building with cotainr
The cotainr is a tool that helps users more easily build Singularity container images that include the Conda packages they use on Neuron and their own systems
By exporting your Conda environment to a yml file, you can build a Singularity container image for the Neuron system that includes your Conda packages
The method to export an existing Conda environment to a yml file on both Neuron and your own system is as follows:
To use cotainr, you must first load the singularity and cotainr modules using the module command
When building a container image using cotainr build, you can either specify a base image directly for the container (using the -base-image option) or use the --system option to select the recommended base image for the Neuron system
You can run the built container image using the singularity exec command and check the list of conda environments created within the container, as shown in the example below.
4. Remote build
※ To use the remote build service provided by Sylabs Cloud (https://cloud.sylabs.io), you must generate an access token and register it on the Neuron system. [Reference 1]
※ Additionally, you can create and manage Singularity container images through web browser access to Sylabs Cloud. [Reference 2]
5. Import/export container image
※ To export (upload) a container image to the Sylabs Cloud library, you must first generate an access token and register it on the Neuron system. [Reference 1]
6. How to install Python packages that are not provided in the container image into the user home directory
※ However, if you use multiple container images, conflicts may arise during the execution of user programs. This happens because the system first searches for packages installed in the user's home directory, which may conflict with packages required by other container images, potentially causing issues.
B. Running a User Program in a Singularity Container
1. Loading the singularity module or setting the path
2. Program execution command in Singularity container
[example]
※ To view the help documentation for Singularity commands [shell | exec | run | pull ...], run “singularity help [command].”
※ To use the Nvidia GPU on a compute/login node, you must use the --nv option.
3. Execute user program using NGC container module
By loading the modules related to NGC Singularity container images using the module command, the container image will automatically launch without having to input the Singularity command, making it easier to run user programs in the Singularity container.
Load the NGC container module and run user programs within the container
※ After loading the container image module, simply entering the execution command automatically runs “singularity run --nv [execution command].”
NGC container module list
※ Docker container images optimized and built for Nvidia GPUs by NGC (https://ngc.nvidia.com) have been converted to Singularity.
※ Container image file path: /apps/applications/singularity_images/ngc
4. How to run containers through the scheduler (SLURM)
Executing GPU Singularity container jobs
1) Execute batch jobs by writing a job script
Execution command : sbatch < job script file >
※ For detailed instructions on using the scheduler (SLURM), refer to "Neuron Guide - Executing Jobs through the Scheduler (SLURM)."
※ You can follow parallel training execution example programs through [Reference 3].
2) Execute interactive jobs on compute nodes allocated by the scheduler
After being allocated a compute node by the scheduler, access the first compute node via shell and run the user program in interactive mode
※ Example of occupying 1 node, using 2 tasks per node, 10 CPUs per task, and 2 GPUs per node
Example of a GPU Singularity container job script
1) Single node
Run command : singularity run --nv <container> [user program execution command]
※ Example of occupying 1 node, using 2 tasks per node, 10 CPUs per task, and 2 GPUs per node
2) Multi node-1
Run command : srun singularity run --nv <container> [user program execution command]
※ Example of occupying 2 nodes, using 2 tasks per node (a total of 4 MPI processes with horovod), 10 CPUs per task, and 2 GPUs per node
3) Multi node-2
When you load the NGC container module, the specified Singularity container automatically launches when you run the user program
Run command : mpirun_wrapper [user program execution command]
※ Example of occupying 2 nodes, using 2 tasks per node (a total of 4 MPI processes with Horovod), 10 CPUs per task, and 2 GPUs per node
C. References
[Reference 1]
Generating a Sylabs Cloud access token and registering on Neuron
[Shortcut to Sylabs Cloud]




[Reference 2]
Building a singularity container image using a remote builder on a web browser


[Reference 3]
Parallel Training Program Execution Example
The following example is set up for users to follow along with running parallel training in a Singularity container using the ResNet50 model written in PyTorch or Keras(TensorFlow) for ImageNet image classification.
1) Copy the job script file from the /apps/applications/singularity_images/examples directory to your job directory
2) Check the partition with compute nodes in an idle state (STATE = idle)
In the example below, available compute nodes exist in partitions such as cas_v100nv_8, cas_v100nv_4, cas_v100_4, and cas_v100_2
3) Modify scheduler options such as job name (-J), wall time (--time), job queue (-p), application name (--comment), and compute node resource requirements (--nodes, --ntasks-per-node, --gres) as well as parameters for the training program in the job script file
4) Submit the job to the scheduler
5) Check the compute nodes allocated by the scheduler
6) Monitor the log file generated by the scheduler

7) Monitor the training process and GPU utilization on the compute nodes allocated by the scheduler

[job script]
1) PyTorch single-node parallel training (01.pytorch.sh)
※ Occupying 1 node, using 2 tasks per node, 10 CPUs per task, and 2 GPUs per node
2) pytorch_horovod multi-node parallel training (02.pytorch_horovod.sh)
※ Occupying 2 nodes, using 2 MPI tasks per node, 10 CPUs per task, and 2 GPUs per node
3) keras(tensorflow)_horovod multi node parallel training (03.keras_horovod.sh)
※ Occupying 2 nodes, using 2 MPI tasks per node, 10 CPUs per task, and 2 GPUs per node
Last updated