Running Jobs Through Scheduler (SLURM)
The Neuron system’s job scheduler uses SLURM. This chapter introduces the method that is used to submit jobs through SLURM and relevant commands. For information on how to write a job script when submitting a job to SLURM, refer to [Appendix 1] and the example showing how to write a job script file. ****
※ User jobs can only be submitted from /scratch.
A. Queue Configuration
Wall clock time: 2 days (48 h)
Exclusive node policy is applied to all queues (partitions).
Job queue (partitions)
◦ The partitions general users can use are as follows (October 2020):
Limit the number of jobs submitted
Maximum number of jobs submitted per user: If a job is submitted after this maximum number has been reached, an error occurs during submission.
Maximum number of running jobs per user: If a job is submitted after this maximum number has been reached, the newly submitted job waits until a previous job is finished.
Limit resource usage
Maximum number of nodes used per job: If a job is submitted after this maximum number has been reached, the newly submitted job is not executed. This is independent of the number of nodes used by the multiple jobs run by a user at any given time. ****
Maximum number of GPUs per user: This setting limits the total number of GPUs that can be used by a user. If this maximum number is reached, a newly submitted job has to wait until a previous job is finished. This setting limits the number of GPUs used by multiple jobs that are run by a user at any given time.
※ The node configuration may be adjusted during system operation according to the system load.
Node configuration
B. Submitting and Monitoring Jobs
1. Summary of the basic commands
※ You can use the “sinfo –help” command to check the “sinfo” command options.
※ For the purpose of collecting data to enhance the convenience and benefits of the Neuron system users, users are required to fill out the information about the program they are using through the SBATCH option shown below. That is, they must submit a job after entering the --comment option of the SBATCH command according to the application they are using by referring to the table below.
※ If you are using an application for deep learning or machine learning, please specify the application type, such as TensorFlow, Caffe, R, and PyTorch.
※ Application categories are added based on the user requests, which are collected periodically. If you would like to have an application added, please make a request to add the application by sending an email to consult@ksc.re.kr.
[Table of SBATCH option name per application]
2. Batch job submission
Submit a job by using the sbatch command as in “sbatch {script file}”.
※ To submit a job, use the mpi.sh file written as a sample job script file.
Check job progress
You can connect to the allocated node to check the job progress.
1) Use the squeue command to check the node name (NODELIST) to which the running job is assigned.
2) Use the ssh command to connect to the allocated node.
3) Connect to the compute node and use either the top or nvidia-smi command to check the job progress.
※ Example of monitoring the GPU utilization every 2 s
Example of writing a job script file
A job script file needs to be created using SLURM keywords to run a batch job in SLURM.
※ Refer to “[Appendix 1] Main Keywords for Job Scripts”.
※ Refer to the KISTI supercomputing blog (http://blog.ksc.re.kr/127) for information on how to use the machine-learning framework conda.
SLURM keywords
※ Example of the SBATCH --nodelist keyword
CPU Serial Program
※ Example of using 1 node sequentially
CPU OpenMP Program
※ Example of using 1 node with 10 threads per node
CPU MPI Program
※ Example of using 2 nodes and 4 processes per node (total of 8 MPI processes)
CPU Hybrid (OpenMP+MPI) Program
※ Example of using 1 node, 2 processes per node, and 10 threads per process (total of 2 MPI processes, 20 OpenMP threads)
GPU Serial Program
※ Example of using 1 node sequentially
GPU OpenMP Program
※ Example of using 1 node, 10 threads per node, and 2 GPUs
GPU MPI Program
※ Example of using 2 nodes, 4 processes per node (total of 8 MPI processes), and 2 GPUs
An example of using GPU singularity
※ Container images that support deep learning frameworks, such as TensorFlow, Caffe, and PyTorch, can be accessed in the /apps/applications/singularity_images and /apps/applications/singularity_images/ngc directories.
※ Refer to "[Appendix 3] Method for Using Singularity Container Images - Method for running the user program from the module-based NGC container" for the list of deep learning and HPC application modules that support the automatic execution of the singularity container.
[Example]
You can use the container images that support deep learning frameworks, such as TensorFlow, Caffe, and PyTorch, by copying them from the /apps/applications/singularity_images directory to the user work directory.
Use the following example if the image file is “/home01/userID/tensorflow-1.13.0-py3.simg”.
3. Interactive job submission
Resource allocation
Explanation: Two GPU nodes (each node having 2 cores and 2 GPUs) of the ivy_v100_2 partition are used for the interactive purpose.
※ If there is no keyboard input for more than 2 h, the job is terminated owing to a timeout, and resources are released. The walltime of the interactive job is set to 12 h.
Job execution
Connect to the first node (head node) among the allocated compute nodes
※ If there is no keyboard input for more than 2 h, the job is terminated owing to a timeout, and resources are released.
※ After connecting to the head node, it is not possible to submit jobs using the srun or mpirun command. However, it is possible to submit jobs after exiting from the head node (exit).
Exit from the connected node, or cancel the resource allocation.
Delete a job using a command.
※ Job ID can be checked using the squeue command.
4. Job monitoring
Check the partition status
Use the sinfo command to check the partition status.(As of August 2019)
※ The node configuration may be adjusted during system operation according to the system load.
PARTITION : the name of the partition set in the current SLURM
AVAIL : partition status (up or down)
TIMELIMIT : wall clock time
NODES : the number of nodes
STATE : node status (alloc-resource is being used/Idle-resource is available)
NODELIST : node list
Detailed information per node
Check the detailed information by using the "-Nel" option after the sinfo command.
Check job status
Check the list and status of the jobs by using the squeue command.
Check the job status and node status graphically
Check the detailed information on the submitted job
C. Controlling Jobs
Delete a job (cancel)
Use the scancel command, as in “scancel [Job_ID]," to delete a job.
Job_ID can be checked by using the squeue command.
D. Compilation, Debugging, and Job Submission Location per Partition
With the user account, ssh connection is not possible except for the nodes specified below.
Because the same home and scratch directories are mounted on all nodes, it is possible to submit jobs for all partitions from the login node (glogin[01-02]).
It is possible to perform compilation and debugging in all partitions using the SLURM Interactive job function when necessary.
2021년 12월 1일에 마지막으로 업데이트되었습니다.
Last updated