Running Jobs Through Scheduler (SLURM)
The Neuron system’s job scheduler uses SLURM. This chapter introduces the method that is used to submit jobs through SLURM and relevant commands. For information on how to write a job script when submitting a job to SLURM, refer to [Appendix 1] and the example showing how to write a job script file.
※ User jobs can only be submitted from /scratch/$USER.
A. Queue Configuration
wall clock time : 2 days (48 h)
In all queues (partitions), job allocation follows the shared node policy (multiple jobs can run simultaneously on a single node). (22.03.17. Update) : To enhance resource efficiency, the policy has been changed from exclusive node usage to shared node usage.
Job queue (partitions)
The partitions available to general users include jupyter, cas_v100nv_8, cas_v100nv_4, cas_v100_4, cas_v100_2, amd_a100nv_8, skl, and bigmem. (You can check the number of nodes, maximum job runtime, and node list using the sinfo command.)
Job Submission Quantity Limitations
Maximum number of jobs per user : An error occurs if jobs exceed the submission limit.
Maximum number of concurrent jobs per user : Jobs wait if they exceed the running job limit.
Resource Occupancy Limit
Maximum number of nodes per job : Jobs will not execute if they exceed the node limit. This limit is independent of the total number of nodes occupied by running jobs.
Maximum number of GPUs per user : This setting limits the total number of GPUs each user can occupy, and if the limit is exceeded, jobs wait until previous jobs are completed. It limits the number of GPUs that a user's running jobs can occupy at any given time.
※ Node configuration and partitioning may be adjusted during system operation based on system usage.
B. Job Submission and Monitoring
1. Summary of the basic commands
Command
Description
$ sbatch [option..] script
Submit a job
$ scancel Job ID
Delete a job
$ squeue
Check job status
$ smap
Check job status and node status graphically
$ sinfo [option..]
Check node information
※ You can use the “sinfo –help” command to check the “sinfo” command options.
※ To enhance the convenience of Neuron system users, it is mandatory to provide information about the program used by including specific SBATCH options, as detailed below. Specifically, before submitting a job, you must fill in the SBATCH --comment option according to the application you are using, referring to the table below.
※ If you are using an application for deep learning or machine learning, please specify it clearly as TensorFlow, Caffe, R, PyTorch, etc.
※ The classification of applications will be updated periodically based on collected user requests. If you wish to add an application, please submit a request to consult@ksc.re.kr.
[Table of SBATCH option name per application]
Application type
SBATCH option name
Application type
SBATCH option name
Charmm
charmm
LAMMPS
lammps
Gaussian
gaussian
NAMD
namd
OpenFoam
openfoam
Quantum Espresso
qe
WRF
wrf
SIESTA
siesta
in-house code
inhouse
Tensorflow
tensorflow
PYTHON
python
Caffe
caffe
R
R
Pytorch
pytorch
VASP
vasp
Sklearn
sklearn
Gromacs
gromacs
Other applications
etc
2. Batch job submission
Submit a job by using the sbatch command as in “sbatch {script file}”.
Check job progress
You can check the status of the job by accessing the allocated node.
1) Use the squeue command to check the node name (NODELIST) to which the running job is assigned.
2) Access the corresponding node using the ssh command
3) Once on the compute node, you can check the job's progress using the top or nvidia-smi commands
※ Example of monitoring GPU usage at 2-second intervals
Example of writing a job script file
To perform batch jobs in SLURM, you need to create a job script file using SLURM keywords.
※ Refer to “[Appendix 1] Main Keywords for Job Scripts”.
※ Refer to the KISTI supercomputing blog (http://blog.ksc.re.kr/127) for information on how to use the machine-learning framework conda.
SLURM keywords
#SBATCH –J
Specify the job name
#SBATCH --time
Specify the maximum time to run the job
#SBATCH –o
Specify the file name of the job log
#SBATCH –e
Specify the file name of the error log
#SBATCH –p
Specify the partition to be used
#SBATCH --comment
Specify the application to be used
#SBATCH -–nodelist=(Node List)
Specify the node on which the job will be run
#SBATCH -–nodes=(Number of Nodes)
Specify the number of nodes to use for the job
#SBATCH —ntasks-per-node=(Number of Processes)
Specify the number of processes to run per node
#SBATCH —cpus-per-task=(Number of CPU Cores)
Number of CPU cores allocated per process
#SBATCH —cpus-per-gpu=(Number of CPU Cores)
Number of CPU cores allocated per GPU
#SBATCH --exclusive
Option for exclusive use of nodes
Set memory allocation under the Neuron shared node policy
To ensure efficient resource use and stable job execution on the Neuron system, memory allocation is automatically adjusted as follows:
※ When using the '--exclusive option, 95% of the available memory on a single node is allocated to the job, allowing exclusive use of the node. However, the wait time may increase until a node is available for exclusive use.
Set the number of CPU cores allocated per GPU under the Neuron shared node policy
To ensure stable execution of GPU applications, the number of CPU cores per node is proportionally allocated to the GPUs as follows: Memory allocation is also automatically set according to the Neuron shared node policy for memory allocation.
※ If additional memory is required, you can request more resources than the default cpus-per-gpu allocation to secure the necessary memory allocation.
CPU Serial Program
※ Example of occupying 1 node and running a sequential job
CPU OpenMP Program
※ Example of occupying 1 node and using 10 threads per node
CPU MPI Program
※ Example of occupying 2 nodes and using 4 processes per node (total of 8 MPI processes)
CPU Hybrid (OpenMP+MPI) Program
※ Example of occupying 1 node and using 2 processes per node, with 10 threads per process (total of 2 MPI processes and 20 OpenMP threads)
GPU Serial Program
※ Example of occupying 1 node and running a sequential job
GPU OpenMP Program
※ Example of occupying 1 node, using 10 threads and 2 GPUs per node
GPU MPI Program
※ Example of occupying 2 nodes, using 4 processes and 2 GPUs per node (total of 8 MPI processes)
CPU MPI Program - Execution example of utilizing all CPUs on 1 node
※ Example of occupying all cores and using 2 GPUs on 1 cas_v100_4 node
GPU MPI Program - Execution example of utilizing only half of the CPUs on 1 node
※ Example of occupying half the cores and using 4 GPUs on 1 cas_v100nv_8 node
※ The total number of cores per partition is determined when executing jobs via the scheduler (SLURM) > A. Refer to the queue configuration > Total number of CPU cores
Execution example of a program that requires a large memory allocation
※ When a program uses few cores but requires a large amount of memory, an example of adjusting memory allocation by setting the number of processes per node to run the program efficiently
※ The '--mem' (memory allocation per node) option is not available. When you enter the number of processes per node (ntasks-per-node) and the number of CPU cores per process (cpus-per-task), the memory allocation is automatically calculated based on the following formula: (memory-per-node = ntasks-per-node * cpus-per-task * (95% of the available memory on a single node / total number of cores on a single node)
※ When using the '--exclusive option, 95% of the available memory on a single node is allocated to the job, allowing exclusive use of the node. However, the wait time may increase until a node is available for exclusive use.
An example of using GPU singularity
※ Container images that support deep learning frameworks, such as TensorFlow, Caffe, and PyTorch, can be accessed in the /apps/applications/singularity_images and /apps/applications/singularity_images/ngc directories.
※ Refer to "[Appendix 3] Method for Using Singularity Container Images - Method for running the user program from the module-based NGC container" for the list of deep learning and HPC application modules that support the automatic execution of the singularity container.
[Example]
You can use the container images that support deep learning frameworks, such as TensorFlow, Caffe, and PyTorch, by copying them from the /apps/applications/singularity_images directory to the user work directory.
Use the following example if the image file is “/home01/userID/tensorflow-1.13.0-py3.simg”.
3. Interactive job submission
Resource allocation
Use the GPU 2-node configuration (each with 2 cores and 2 GPUs) in the cas_v100_4 partition for interactive purposes.
※ Refer to the SBATCH option labels for each application to find the appropriate {SBATCH option name}
※ The walltime for interactive jobs is fixed at 8 hours
Job execution
Exit from the connected node, or cancel the resource allocation.
Delete a job using a command.
※ Job ID can be checked using the squeue command.
4. Job monitoring
Check the partition status
Use the sinfo command to check the status
※ The node configuration may be adjusted during system operation according to the system load.
PARTITION : the name of the partition set in the current SLURM
AVAIL : partition status (up or down)
TIMELIMIT : wall clock time
NODES : the number of nodes
STATE : node status (alloc-resource is being used/Idle-resource is available)
NODELIST : node list
Detailed information per node
Adding the "-Nel" option to the sinfo command provides detailed information.
Check job status
Use the squeue command to view the job list and status
Check job status and node status through graphical displays
View detailed information on submitted jobs
C. Controlling Jobs
Delete a job (cancel)
Use the scancel command to delete a job by entering “scancel [Job_ID]”.
The Job_ID can be found using the squeue command.
D. Compile, Debugging, Job submission Loacation
Debugging nodes, which can be directly accessed via SSH from the login node, are available.
Compilation, debugging, and job submission to all partitions are possible from the login/debugging nodes.
The CPU time limit for debugging nodes is 120 minutes.
If needed, you can use the SLURM Interactive Job feature to compile and debug in any partition.
Last updated on November 08, 2024.
Last updated