Job Execution through Scheduler (PBS)
Job Execution through Scheduler (PBS)
Last updated
Job Execution through Scheduler (PBS)
Last updated
The job scheduler for the 5th Nurion system uses the Portable Batch System (PBS). This section introduces how to submit jobs through the scheduler and related commands. The queues available for job submission are predefined, and the maximum number of jobs a user can submit per queue is limited, with this value subject to change depending on the system load.
The Nurion system applies an exclusive node allocation policy by default, ensuring that only one user's job can run on a node at a time. This policy is intended to prevent the severe performance degradation of user applications that can sometimes occur under a shared node policy. However, for queues that allow the use of commercial software, where the node scale is smaller, a shared node policy is applied to utilize resources more efficiently.
User jobs can only be submitted through the login node, and regular users cannot directly access the compute nodes.
Moreover, user jobs can only be submitted in /scratch/$USER.
The commercial queue (for running commercial software) and the debug queue (for debugging) apply a shared node policy, where multiple jobs can be assigned per node within the available resources (CPU cores). In contrast, the other queues apply an exclusive node policy, allowing only one job per node.
Job queue
The queues that can be used by general users and the number of jobs that can be submitted per user are presented in the following table. (as of April 2021)
※ Node configuration can be adjusted while the system is in operation depending on the system loads.(The node configuration and maximum number of jobs allowed can be checked frequently through showq command and motd)
Maximum no. of job submissions per user : Error occurs at the time of submission if the number of allowed submissions is exceeded
Maximum no. of job executions per user: If jobs are submitted beyond the limit, you must wait until previous jobs are completed.
No. of occupied nodes per job (max|min): An error will occur at submission if the number of nodes occupied by a single job falls outside the minimum or maximum range. This is independent of the number of nodes occupied by the user's queued and running jobs.
The exclusive, normal, long, and debug queues are set to Cache mode (using MCDRAM as L3 cache), while the flat queue is set to Flat mode (using MCDRAM as RAM along with DDR4).
To protect the system, Cache mode is limited to a maximum usable memory of 82GB, and Flat mode is limited to 102GB
Writing a job script and an example
When writing a job script, refer to the mandatory keywords below, the Job Script Example below, and Key Job Script Keywords in [Appendix 1] before submitting. Additionally, job script example files can be found in /apps/shell/home/job_examples.
Required option for PBS job scheduler
Required option
Description
#PBS –V
Maintain current environment variables
#PBS –N
Set the job name
#PBS –q
Specify the queue in which the job will run
#PBS –l
Configure the resources to be used for the job
#PBS –A
Provide information about the program being used (for statistical purposes)
Variable keyword for resource allocation
Specify resources you wish to use with keywords such as select, ncpus, ompthreads, mpiprocs, and walltime
Keyword
Description
select
No. of nodes to be used
ncpus
No. of CPUs to be used
(≥ no. of processes per node * no. of threads)
mpiprocs
No. of processes per node to be used
ompthreads
No. of OMP threads to be used
walltime
Job execution time
※ To improve user convenience on the Nurion system, it is mandatory to provide program information through the PBS options as outlined below. Specifically, you must fill in the PBS-A option according to the application you are using, as referenced in the table below, before submitting your job. (Effective from April 2019)
※ The classification of applications will be updated periodically based on collected user requests. If you wish to add an application, please submit a request to consult@ksc.re.kr.
[PBS option names per application]
Application type
PBS option name
Application type
PBS option name
ANSYS (CFX, Fluent)
ansys
VASP
vasp
Abaqus
abaqus
Gromacs
gromacs
Nastran
nastran
Amber
amber
Gaussian
gaussian
LAMMPS
lammps
OpenFoam
openfoam
NAMD
namd
WRF
wrf
Quantum Espresso
qe
CESM (including CAM)
cesm
QMCpack
qmc
MPAS
mpas
BWA
bwa
ROMs
roms
SIESTA
siesta
MOM
mom
in-house code
inhouse
TensorFlow
tf
Caffe
caffe
PyTorch
pytorch
Qchem
qchem
grims
grims
RAMSES
ramses
cp2k
cp2k
Charmm
charmm
Other applications
etc
(e.g.: For VASP users, add #PBS -A vasp to a PBS program job script)
Environment variables
Environment variable
Description
PBS_JOBID
Identifier allocated to a job
PBS_JOBNAME
Job name provided to a user
PBS_NODEFILE
File name containing the list of computing nodes allocated to a job
PBS_O_PATH
Path value of a submission environment
PBS_O_WORKDIR
Absolute path location where qsub is executed
TMPDIR
Temporary directory designated for a job
To execute a batch job in PBS, the PBS keywords explained above must be used when writing a job script file.
※ A job submission script sample file can be copied from /apps/shell/home/job_examples to be used.
Example of serial program job script (serial.sh)
※ Single-node occupancy sequential usage example
※ As shown in the above example, when submitting a job using the #PBS-m and #PBS-M options, an email will be sent to abc@def.com when the job starts, completes, or if it is interrupted.
Example of an OpenMP program job script (openmp.sh)
※ Single-node occupancy, using 64 threads per node (a total of 64 OpenMP threads) example
Example of an MPI (IntelMPI) program job script (mpi.sh)
※ Four-node occupancy, using 64 processes per node (a total of 256 MPI processes) example
Example of an MPI (OpenMPI) program job script (mpi.sh)
※ Four-node occupancy, using 64 processes per node (a total of 256 MPI processes) example
Example of an MPI (Mvapich2) program job script (mpi_mvapich2.sh)
※ Four-node occupancy, using 64 processes per node (a total of 256 MPI processes)example
※ While mpirun can be used on a smaller number of nodes, it is recommended to use mpirun_rsh as shown in the example above for jobs that use a large number of nodes, asthe job deployment may not proceed properly
Example of a hybrid (IntelMPI + OpenMP) program job script (hybrid_intel.sh)
※ Four-node occupancy, using 2 processes per node, with 32 threads per process (a total of 8 MPI processes and 256 OpenMP threads) example
Example of a hybrid (openMPI + OpenMP) program job script (hybrid_openmpi.sh)
※ Four-node occupancy, using 2 processes per node, with 32 threads per process (a total of 8 MPI processes and 256 OpenMP threads) example
Example of a hybrid (Mvapich2 + OpenMP) program job script (hybrid_mvapich2.sh)
※ Four-node occupancy, using 2 processes per node, with 32 threads per process (a total of 8 MPI processes and 256 OpenMP threads) example
※ While mpirun can be used on a smaller number of nodes, it is recommended to use mpirun_rsh as shown in the example above for jobs that use a large number of nodes, as the job deployment may not proceed properly
Example of submitting a written job script
※ The mpi.sh file is an example; submit the job using your custom job script file
When performing PBS batch jobs, STDOUT (standard output) and STDERR (standard error) are saved in the system directory's output and copied to the user's job submission directory after the job completes. By default, job-related content cannot be checked until the job is complete, but adding the following keywords allows you to monitor progress.
Keyword for checking STDOUT/STDERR generated by PBS during job execution (generated in the /home01 file)
Checking job execution using the Redirection feature of Linux
Specify email notifications for job status
Option
Description
a
When a job is halted (default value)
b
When a job has started
e
When a job has been completed
n
Do not receive email notifications
For interactive job submission, unlike writing a job script, omit #PBS and use options like -I and -A.
※ If inactive for more than 2 hours, the job will time out, and resources will be reclaimed. The maximum walltime for an interactive job is fixed at 12 hours.
Using the “-I” option instead of a batch script
Using graphic environment when submitting interactive jobs (-X)
※ The content of the -l select statement can be modified according to user requirements, but the above statements (resource occupancy, queue name, PBS option name) must be included when submitting a job
Inheriting existing environment variables when submitting interactive jobs (-V)
※ Pay attention to the lower and upper cases of “I” in the above example
Commands related to job monitoring can only be used in the login node.
Queue inquiry
Check idle resources for each queue
View the list of queues available for the current user account
Checking the top job
Checking the job status
qstat -u
View the list of queues available for the current user account
qstat -T
Check the remaining wait time for jobs in the Q status
qstat -i
View jobs only in the Q and H statuses
qstat -f
View detailed job information
qstat -x
View completed jobs
※ Job Id: Job number.pbs
※ Name: #PBS –N value of a job script
※ S: Shows the operation status of a job (R-running/ Q-queue/ H-halted/ E-error)
Job attribute inquiry
Check the estimated start time of a job.
※ Here, i is a flag that shows the list of jobs in the H or Q statuses, and -w is a flag that prints detailed information in a wider format (when using the -w flag, expanding the terminal window horizontally will help align the information for easier viewing)$ qstat -i -w -T -u user01
※ Calculated estimation based on the collected walltime information of a user’s job script
Deleting a job
Suspending/resuming a job
Last updated on November 08, 2024.