Job Execution through Scheduler (PBS)
Job Execution through Scheduler (PBS)
The job scheduler of the 5th supercomputer Nurion uses a portable batch system (PBS). In this chapter, a method for submitting jobs through a scheduler and the related commands are introduced. The queues for users to submit jobs are already determined. The maximum number of jobs that a user can submit per queue is limited, which can be adjusted depending on the load of the system.
Nurion employs an exclusive node allocation policy as the default to ensure that only the jobs of one user are executed in one node. It prevents significant performance degradation of the user application, which can occur when the shared node allocation policy is applied. However, queues that cannot use commercial SW are applied with the shared node policy to ensure efficient use of resources because the node scale is relatively small.
A user’s job can only be submitted through the login node, and general users cannot directly access the computing node.
Moreover, user jobs can only be submitted in /scratch.
A. Queue Configuration
The commercial queue (for executing commercial SW) and the debug queue for debugging are applied with the shared node policy, where multiple jobs are assigned per node within the range of available resources (CPU core); only one job is assigned to one node in the remaining queues according to the exclusive node policy.
Job queue
The queues that can be used by general users and the number of jobs that can be submitted per user are presented in the following table. (as of April 2021)
※ Node configuration can be adjusted while the system is in operation depending on the system loads. (The node configuration and maximum number of jobs allowed can be checked frequently through showq command and motd)
1. Queue description
2. Limited number of job submissions
Maximum no. of job submissions per user : Error occurs at the time of submission if the number of allowed submissions is exceeded
Maximum no. of job executions per user : Previous job needs to be completed first if the number of allowed submissions is exceeded
3. Limited resource occupancy
No. of occupied nodes per job (max|min) : Error occurs at the time of submission if the number of occupied nodes in a single job exceeds the min. and max. range. It is not associated with the number of occupied nodes of jobs being executed or queued.
4. Distinguishing queues according to the KNL memory queue (all cluster modes are quadrant)
exclusive, normal, long, and debug queues are set in the cache mode (MCDRAM is used as the L3 cache), while the flat queue is set in the flat mode (MCDRAM is used as RAM along with DDR4).
The cache mode has a maximum available memory of 82 GB, whereas the flat mode has 102 GB for protecting the system.
5. By setting hyperthread as off, up to 68 threads per node can be used for KNL, and up to 40 threads per node can be used for SKL.
B. Job Submission and Monitoring
1. Batch job submission
Writing a job script and an example
Refer to the required keywords and job script sample shown below and Annex 1 Job script keywords when writing and submitting a job script. An example of a job script file can be found in /apps/shell/home/job_examples.
Required option for PBS job scheduler
Required option | Description |
#PBS –V | Maintains current environment variables |
#PBS –N | Sets the job name |
#PBS –q | Queue for executing a job |
#PBS –l | Sets the resources to be used in a job |
#PBS –A | Information of the used program (statistical purpose) |
Variable keyword for resource allocation Designate the resources to be used with keywords such as select, ncpus, ompthreads, miprocs, and walltime
Keyword | Description |
select | No. of nodes to be used |
ncpus | No. of CPUs to be used (≥ no. of processes per node * no. of threads) |
mpiprocs | No. of processes per node to be used |
ompthreads | No. of OMP threads to be used |
walltime | Job execution time |
※ For the purpose of collecting data to increase the convenience of Nurion users, drafting the information of the program being used through PBS options has become mandatory, as shown below. In other words, the -A option of PBS must be included in the application based on the table below when submitting a job. (since April 2019)
※ Adding an application is handled according to the user demand collected periodically. If necessary, please make an additional request for the respective application to consult@ksc.re.kr.
[PBS option names per application]
Application type | PBS option name | Application type | PBS option name |
ANSYS (CFX, Fluent) | ansys | VASP | vasp |
Abaqus | abaqus | Gromacs | gromacs |
Nastran | nastran | Amber | amber |
Gaussian | gaussian | LAMMPS | lammps |
OpenFoam | openfoam | NAMD | namd |
WRF | wrf | Quantum Espresso | qe |
CESM (including CAM) | cesm | QMCpack | qmc |
MPAS | mpas | BWA | bwa |
ROMs | roms | SIESTA | siesta |
MOM | mom | in-house code | inhouse |
TensorFlow | tf | Caffe | caffe |
PyTorch | pytorch | Qchem | qchem |
grims | grims | RAMSES | ramses |
cp2k | cp2k | Charmm | charmm |
Other applications | etc. |
(e.g.: For VASP users, add #PBS -A vasp to a PBS program job script)
Environment variables
Environment variable | Description |
PBS_JOBID | Identifier allocated to a job |
PBS_JOBNAME | Job name provided to a user |
PBS_NODEFILE | File name containing the list of computing nodes allocated to a job |
PBS_O_PATH | Path value of a submission environment |
PBS_O_WORKDIR | Absolute path location where qsub is executed |
TMPDIR | Temporary directory designated for a job |
To execute a batch job in PBS, the PBS keywords explained above must be used when writing a job script file.
※ A job submission script sample file can be copied from /apps/shell/home/job_examples to be used.
Example of serial program job script (serial.sh)
※ Example of 1 node allocated and serial use
※ When #PBS –m and #PBS –M options are used to submit a job as shown in the example above, emails are sent to abc@def.com when a job is executed, completed, or halted
Example of an OpenMP program job script (openmp.sh)
※ Example of 1 node occupied and 64 threads (total of 64 OpenMP threads) being used per node
Example of an MPI (IntelMPI) program job script (mpi.sh)
※ Example of four nodes occupied and 64 processes (total of 256 MPI processes) being used per node
Example of an MPI (OpenMPI) program job script (mpi.sh)
※ Example of four nodes occupied and 64 processes (total of 256 MPI processes) being used per node
Example of an MPI (Mvapich2) program job script (mpi_mvapich2.sh)
※ Example of four nodes occupied and 64 processes (total of 256 MPI processes) being used per node
Example of a hybrid (IntelMPI + OpenMP) program job script (hybrid_intel.sh)
※ Example of four nodes occupied, two processes per node, and 32 threads per process (total of eight MPI processes and 256 OpenMP threads) being used
Example of a hybrid (openMPI + OpenMP) program job script (hybrid_openmpi.sh)
※ Example of four nodes occupied, two processes per node and 32 threads per process (total of eight MPI processes and 256 OpenMP threads) being used
Example of a hybrid (Mvapich2 + OpenMP) program job script (hybrid_mvapich2.sh)
※ Example of four nodes occupied, two processes per node, and 32 threads per process (total of eight MPI processes and 256 OpenMP threads) being used
Example of submitting a written job script
※ Based on the mpi.sh file as an example, submit a job using the written job script file.
When executing a PBS batch job, STDOUT (standard output) and STDERR (standard error) during a job are saved in the output of the system directory and then copied to a user’s job submission directory once the job is completed. Fundamentally, details related to a job cannot be checked until the job is completed, but the details can be checked if the next keyword is added.
Keyword for checking STDOUT/STDERR generated by PBS during job execution (generated in the /home01 file)
Checking job execution using the Redirection feature of Linux
Designating email notifications for a job
Option | Description |
a | When a job is halted (default value) |
b | When a job has started |
e | When a job has been completed |
n | Do not receive email notifications |
2. Submitting interactive jobs
※ Unlike writing a job script, #PBS is omitted and only the –I –A option is used for submitting interactive jobs
※ When idle for at least two hours, a job ends from timeout and resources are retrieved; the walltime of interactive jobs is fixed at 12 h at the maximum
Using the “-I” option instead of a batch script
Using graphic environment when submitting interactive jobs (-X)
※ Here, phrases -l select or below can be used by changing according to the user demands, but the above phrases (resource occupancy, queue name, and PBS option name) must be included when submitting a job
Inheriting existing environment variables when submitting interactive jobs (-V)
※ It is recommended to perform debugging by submitting interactive jobs without providing a debug node.
※ Pay attention to the lower and upper cases of “I” in the above example
Example of executing a TensorFlow program in a computing node in interactive mode
※ Location of example singularity image file: /apps/applications/tensorflow/1.12.0/tensorflow-1.12.0-py3.simg
※ Location of example convolutional.py: /apps/applications/tensorflow/1.12.0/examples/convolutional.py
It is recommended to copy the above files to the user directory being tested.
3. Job monitoring
Commands related to job monitoring can only be used in the login node.
Queue inquiry
Inquiring unused resources of a node per queue
Inquiring a queue list that can be used by the current account
Checking the job status
Option | Description |
qstat -u | Inquiry only on the user’s jobs |
qstat -T | Inquire remaining queue time of the jobs in Q state |
qstat -i | Inquire only jobs in Q and H state |
qstat -f | Inquire job details |
qstat -x | Inquire completed jobs |
※ Job Id: Job number.pbs
※ Name: #PBS –N value of a job script
※ S: Shows the operation status of a job (R-running/ Q-queue/ H-halted/ E-error)
Job attribute inquiry
Job queue time inquiry (Estimated Start Time)
※ Here, i is a flag showing the list of jobs in the H or Q state, whereas -w is a flag that outputs the detailed information horizontally (when using the -w flag, information can be easily seen if the terminal window is expanded lengthwise for arrangement)
※ Calculated estimation based on the collected walltime information of a user’s job script
C. Job Control
Deleting a job
Suspending/resuming a job
2023년 3월 24일에 마지막으로 업데이트되었습니다.
Last updated