Job Execution through Scheduler (PBS)

The job scheduler for the 5th Nurion system uses the Portable Batch System (PBS). This section introduces how to submit jobs through the scheduler and related commands. The queues available for job submission are predefined, and the maximum number of jobs a user can submit per queue is limited, with this value subject to change depending on the system load.

The Nurion system applies an exclusive node allocation policy by default, ensuring that only one user's job can run on a node at a time. This policy is intended to prevent the severe performance degradation of user applications that can sometimes occur under a shared node policy. However, for queues that allow the use of commercial software, where the node scale is smaller, a shared node policy is applied to utilize resources more efficiently.

User jobs can only be submitted through the login node, and regular users cannot directly access the compute nodes.

Moreover, user jobs can only be submitted in /scratch/$USER.

A. Queue Configuration

The commercial queue (for running commercial software) and the debug queue (for debugging) apply a shared node policy, where multiple jobs can be assigned per node within the available resources (CPU cores). In contrast, the other queues apply an exclusive node policy, allowing only one job per node.
Job queue
- The queues that can be used by general users and the number of jobs that can be submitted per user are presented in the following table. (as of April 2021)

※ Node configuration can be adjusted while the system is in operation depending on the system loads.(The node configuration and maximum number of jobs allowed can be checked frequently through showq command and motd)

1. Queue description

2. Limited number of job submissions

Maximum no. of job submissions per user : Error occurs at the time of submission if the number of allowed submissions is exceeded
Maximum no. of job executions per user: If jobs are submitted beyond the limit, you must wait until previous jobs are completed.

3. Limited resource occupancy

No. of occupied nodes per job (max|min): An error will occur at submission if the number of nodes occupied by a single job falls outside the minimum or maximum range. This is independent of the number of nodes occupied by the user's queued and running jobs.

4. Queue classification based on KNL memory mode (all cluster modes are set to Quadrant)

The exclusive, normal, long, and debug queues are set to Cache mode (using MCDRAM as L3 cache), while the flat queue is set to Flat mode (using MCDRAM as RAM along with DDR4).
To protect the system, Cache mode is limited to a maximum usable memory of 82GB, and Flat mode is limited to 102GB

5. With the Hyperthreading off setting, a maximum of 68 threads per node can be used on KNL, and a maximum of 40 threads per node can be used on SKL

B. Job Submission and Monitoring

1. Batch job submission

Writing a job script and an example
When writing a job script, refer to the mandatory keywords below, the Job Script Example below, and Key Job Script Keywords in [Appendix 1] before submitting. Additionally, job script example files can be found in /apps/shell/home/job_examples.
Required option for PBS job scheduler

Required option

Description

#PBS –V

Maintain current environment variables

#PBS –N

Set the job name

#PBS –q

Specify the queue in which the job will run

#PBS –l

Configure the resources to be used for the job

#PBS –A

Provide information about the program being used (for statistical purposes)

Variable keyword for resource allocation
Specify resources you wish to use with keywords such as select, ncpus, ompthreads, mpiprocs, and walltime

Keyword

Description

select

No. of nodes to be used

ncpus

No. of CPUs to be used

(≥ no. of processes per node * no. of threads)

mpiprocs

No. of processes per node to be used

ompthreads

No. of OMP threads to be used

walltime

Job execution time

※ To improve user convenience on the Nurion system, it is mandatory to provide program information through the PBS options as outlined below. Specifically, you must fill in the PBS-A option according to the application you are using, as referenced in the table below, before submitting your job. (Effective from April 2019)

※ The classification of applications will be updated periodically based on collected user requests. If you wish to add an application, please submit a request to consult@ksc.re.kr.

[PBS option names per application]

Application type

PBS option name

Application type

PBS option name

ANSYS (CFX, Fluent)

ansys

VASP

vasp

Abaqus

abaqus

Gromacs

gromacs

Nastran

nastran

Amber

amber

Gaussian

gaussian

LAMMPS

lammps

OpenFoam

openfoam

NAMD

namd

WRF

wrf

Quantum Espresso

CESM (including CAM)

cesm

QMCpack

qmc

MPAS

mpas

BWA

bwa

ROMs

roms

SIESTA

siesta

MOM

mom

in-house code

inhouse

TensorFlow

Caffe

caffe

PyTorch

pytorch

Qchem

qchem

grims

grims

RAMSES

ramses

cp2k

cp2k

Charmm

charmm

Other applications

etc

(e.g.: For VASP users, add #PBS -A vasp to a PBS program job script)

Environment variables

Environment variable

Description

PBS_JOBID

Identifier allocated to a job

PBS_JOBNAME

Job name provided to a user

PBS_NODEFILE

File name containing the list of computing nodes allocated to a job

PBS_O_PATH

Path value of a submission environment

PBS_O_WORKDIR

Absolute path location where qsub is executed

TMPDIR

Temporary directory designated for a job

To execute a batch job in PBS, the PBS keywords explained above must be used when writing a job script file.

※ A job submission script sample file can be copied from /apps/shell/home/job_examples to be used.

Example of serial program job script (serial.sh)

#!/bin/sh
#PBS -N serial_job
#PBS -V
#PBS -q normal
#PBS -A {PBS option name}  # refer to the table of PBS option name per application
#PBS -l select=1:ncpus=1:mpiprocs=1:ompthreads=1
#PBS -l walltime=04:00:00
#PBS -m abe  # job email notification option
#PBS -M abc@def.com  # recipient email address 

cd $PBS_O_WORKDIR

module purge
module load craype-mic-knl
 
./test.exe

※ Single-node occupancy sequential usage example

※ As shown in the above example, when submitting a job using the #PBS-m and #PBS-M options, an email will be sent to abc@def.com when the job starts, completes, or if it is interrupted.

Example of an OpenMP program job script (openmp.sh)

#!/bin/sh#PBS -N openmp_job
#PBS -V
#PBS -q normal
#PBS -A {PBS option name}  # refer to the table of PBS option name per application
#PBS -l select=1:ncpus=64:mpiprocs=1:ompthread=64
#PBS -l walltime=04:00:00 

cd $PBS_O_WORKDIR

module purge
module load craype-mic-knl
 
./test_omp.exe

※ Single-node occupancy, using 64 threads per node (a total of 64 OpenMP threads) example

Example of an MPI (IntelMPI) program job script (mpi.sh)

#!/bin/sh
#PBS -N IntelMPI_job
#PBS -V
#PBS -q normal
#PBS -A {PBS option name}  # refer to the table of PBS option name per application
#PBS -l select=4:ncpus=64:mpiprocs=64:ompthread=1
#PBS -l walltime=04:00:00 

cd $PBS_O_WORKDIR

module purge
module load craype-mic-knl intel/18.0.3 impi/18.0.3
 
mpirun ./test_mpi.exe

※ Four-node occupancy, using 64 processes per node (a total of 256 MPI processes) example

Example of an MPI (OpenMPI) program job script (mpi.sh)

#!/bin/sh
#PBS -N OpenMPI_job
#PBS -V
#PBS -q normal
#PBS -A {PBS option name}  # refer to the table of PBS option name per application
#PBS -l select=4:ncpus=64:mpiprocs=64:ompthread=1
#PBS -l walltime=04:00:00 

cd $PBS_O_WORKDIR

module purge
module load craype-mic-knl gcc/7.2.0 openmpi/3.1.0
 
mpirun ./test_mpi.exe

※ Four-node occupancy, using 64 processes per node (a total of 256 MPI processes) example

Example of an MPI (Mvapich2) program job script (mpi_mvapich2.sh)

#!/bin/sh
#PBS -N mvapich2_job
#PBS -V
#PBS -q normal
#PBS -A {PBS option name}  # refer to the table of PBS option name per application
#PBS -l select=4:ncpus=64:mpiprocs=64:ompthread=1
#PBS -l walltime=04:00:00 

cd $PBS_O_WORKDIR
 
module purge
module load craype-mic-knl intel/18.0.3 mvapich2/2.3.1
 
TOTAL_CPUS=$(wc -l $PBS_NODEFILE | awk '{print $1}')
 
mpirun_rsh -np ${TOTAL_CPUS} -hostfile $PBS_NODEFILE ./test_mpi.exe

※ Four-node occupancy, using 64 processes per node (a total of 256 MPI processes)example

※ While mpirun can be used on a smaller number of nodes, it is recommended to use mpirun_rsh as shown in the example above for jobs that use a large number of nodes, asthe job deployment may not proceed properly

Example of a hybrid (IntelMPI + OpenMP) program job script (hybrid_intel.sh)

#!/bin/sh
#PBS -N hybrid_job
#PBS -V
#PBS -q normal
#PBS -A {PBS option name}  # refer to the table of PBS option name per application
#PBS -l select=4:ncpus=64:mpiprocs=2:ompthread=32
#PBS -l walltime=04:00:00 

cd $PBS_O_WORKDIR
 
module purge
module load craype-mic-knl intel/18.0.3 impi/18.0.3 

mpirun ./test_mpi.exe

※ Four-node occupancy, using 2 processes per node, with 32 threads per process (a total of 8 MPI processes and 256 OpenMP threads) example

Example of a hybrid (openMPI + OpenMP) program job script (hybrid_openmpi.sh)

#!/bin/sh
#PBS -N hybrid_job
#PBS -V
#PBS -q normal
#PBS -A {PBS option name}  # refer to the table of PBS option name per application
#PBS -l select=4:ncpus=64:mpiprocs=2:ompthread=32
#PBS -l walltime=04:00:00 

cd $PBS_O_WORKDIR
 
module purge
module load craype-mic-knl gcc/7.2.0 openmpi/3.1.0 

mpirun --map-by NUMA:PE=34 ./test_mpi.exe

※ Four-node occupancy, using 2 processes per node, with 32 threads per process (a total of 8 MPI processes and 256 OpenMP threads) example

Example of a hybrid (Mvapich2 + OpenMP) program job script (hybrid_mvapich2.sh)

#!/bin/sh
#PBS -N hybrid_job
#PBS -V
#PBS -q normal
#PBS -A {PBS option name}  # refer to the table of PBS option name per application
#PBS -l select=4:ncpus=64:mpiprocs=2:ompthread=32
#PBS -l walltime=04:00:00 

cd $PBS_O_WORKDIR
 
module purge
module load craype-mic-knl intel/18.0.3 mvapich2/2.3.1

TOTAL_CPUS=$(wc -l $PBS_NODEFILE | awk '{print $1}')
 
mpirun_rsh -np ${TOTAL_CPUS} -hostfile $PBS_NODEFILE OMP_NUM_THREADS=$OMP_NUM_THREADS ./test_mpi.exe

※ Four-node occupancy, using 2 processes per node, with 32 threads per process (a total of 8 MPI processes and 256 OpenMP threads) example

Example of submitting a written job script

$ qsub mpi.sh

※ The mpi.sh file is an example; submit the job using your custom job script file

When performing PBS batch jobs, STDOUT (standard output) and STDERR (standard error) are saved in the system directory's output and copied to the user's job submission directory after the job completes. By default, job-related content cannot be checked until the job is complete, but adding the following keywords allows you to monitor progress.
Keyword for checking STDOUT/STDERR generated by PBS during job execution (generated in the /home01 file)

#PBS -W sandbox=PRIVATE

Checking job execution using the Redirection feature of Linux

./test.exe 1>stdout 2>stderr

Specify email notifications for job status

$ qsub -m -M   
ex) qsub -m abe -M abc@def.com hello_world.sh

Option

Description

When a job is halted (default value)

When a job has started

When a job has been completed

Do not receive email notifications

2. Submitting interactive jobs

For interactive job submission, unlike writing a job script, omit #PBS and use options like -I and -A.

※ If inactive for more than 2 hours, the job will time out, and resources will be reclaimed. The maximum walltime for an interactive job is fixed at 12 hours.

Using the “-I” option instead of a batch script

$ qsub -I -l select=1:ncpus=68:ompthreads=1 -l walltime=12:00:00 -q normal 
-A {PBS option name}

Using graphic environment when submitting interactive jobs (-X)

$ qsub -I -X -l select=1:ncpus=68:ompthreads=1 -l walltime=12:00:00 -q normal -A {PBS option name}

※ The content of the -l select statement can be modified according to user requirements, but the above statements (resource occupancy, queue name, PBS option name) must be included when submitting a job

Inheriting existing environment variables when submitting interactive jobs (-V)

$ qsub -I -V -l select=1:ncpus=68:ompthreads=1 -l walltime=12:00:00 -q normal -A {PBS option name}

※ Pay attention to the lower and upper cases of “I” in the above example

3. Job monitoring

Commands related to job monitoring can only be used in the login node.

Queue inquiry

$ showq

Check idle resources for each queue

$ pbs_status

View the list of queues available for the current user account

$ pbs_queue_check

Checking the top job

$ show_topjob

Checking the job status

qstat -u

View the list of queues available for the current user account

qstat -T

Check the remaining wait time for jobs in the Q status

qstat -i

View jobs only in the Q and H statuses

qstat -f

View detailed job information

qstat -x

View completed jobs

$ qstat <-a, -n, -s, -H, -x, ...>
ex> qstat
Job id Name User Time Use S Queue
0001.pbs test_01 user01 8245:43: R normal
0002.pbs test_02 user02 8245:44: R flat
0003.pbs test_03 user03 7078:45 R norm_skl
0003.pbs test_04 user04 1983:11: Q long

※ Job Id: Job number.pbs

※ Name: #PBS –N value of a job script

※ S: Shows the operation status of a job (R-running/ Q-queue/ H-halted/ E-error)

Job attribute inquiry

$ qstat -f    
ex> qstat -f 0000
Job Id : 0000.pbs
   Job_Name = test
   Job_Owner = user@login01
   resources_used.cpupercent = 6416
   resources_used.cput = 8245:43:20
   resources_used.mem = 33154824kb
   resources_used.ncpus = 64
   resources_used.vmem = 999899940kb
   resources_used.walltime = 128:54:21
   job_state = R
...
<omitted>

Check the estimated start time of a job.

$ qstat -i -w -T -u

※ Here, i is a flag that shows the list of jobs in the H or Q statuses, and -w is a flag that prints detailed information in a wider format (when using the -w flag, expanding the terminal window horizontally will help align the information for easier viewing)$ qstat -i -w -T -u user01

※ Calculated estimation based on the collected walltime information of a user’s job script

C. Job Control

Deleting a job

$ qdel [job_id]

Suspending/resuming a job

$ qsig -s

Last updated on November 08, 2024.

PreviousUser Programming Environment NextUser Support

Last updated 11 months ago