PBS Important Commands & Attributes



1)Get the Details About the Server Attributes.
# qstat -Bf
Server: master
server_state = Active
server_host = master.hcl.com
scheduling = True
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
:0
default_queue = workq
log_events = 511
mail_from = adm
query_other_jobs = True
resources_default.ncpus = 1
default_chunk.ncpus = 1
resources_assigned.mpiprocs = 0
resources_assigned.ncpus = 0
resources_assigned.nodect = 0
scheduler_iteration = 600
FLicenses = 0

resv_enable = True
node_fail_requeue = 310
max_array_size = 10000
pbs_license_info = /mnt_nfs/repo_OSS_complete_rsync/ALUS_ALIN_20130522_1958
48.dat
pbs_license_min = 1
pbs_license_max = 2147483647
pbs_license_linger_time = 31536000
license_count = Avail_Global:0 Avail_Local:0 Used:0 High_Use:0 Avail_Socket
s:2 Unused_Sockets:0
pbs_version = PBSPro_12.0.1.130184
eligible_time_enable = False
max_concurrent_provision = 5
2)List Details Information About All The Nodes.
# pbsnodes -a
master
Mom = master.hcl.com
Port = 15002
pbs_version = PBSPro_12.0.1.130184
ntype = PBS
state = free
pcpus = 1
resources_available.arch = linux
resources_available.host = master
resources_available.mem = 502348kb
resources_available.ncpus = 1
resources_available.vnode = master
resources_assigned.accelerator_memory = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
license = l
node1
Mom = node1.hcl.com

Port = 15002
pbs_version = PBSPro_12.0.1.130184
ntype = PBS
state = free
pcpus = 2
resources_available.arch = linux
resources_available.host = node1
resources_available.mem = 3958120kb
resources_available.ncpus = 2
resources_available.vnode = node1
resources_assigned.accelerator_memory = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
license = l


3)> qstat -answ
master:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
------------------------------ --------------- --------------- --------------- -------- ---- ----- ------ ----- - -----
138.master jkumar long JOb_batch_Q 11561 2 3 -- 72:00 E 00:00:00
node1/0*2+master/0
--
139.master jkumar workq STDIN -- 2 3 -- -- Q --
--
Not Running: Not enough free nodes available
# qstat -Qf long
Queue: long
queue_type = Execution
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
:0
resources_default.ncpus = 3
resources_default.nodes = 2
resources_default.walltime = 72:00:00
resources_assigned.mpiprocs = 0
resources_assigned.ncpus = 0
resources_assigned.nodect = 0
enabled = True
started = True

Job Can be submitted in two Ways.
1)Interaction Method.
2)Via script we can submit the Job.

1)Interaction Method
#qsub -l select=1:ncpus=2+1:ncpus=1 – (Command (or) Job-Script)
2)Script Method
$ cat script.sh
#! /bin/bash
#PBS -l select=1:ncpus=2+1:mpiprocs=3
#PBS -N JOb_batch_Q
#PBS -q long
#pbs -j oe
#pbs -o output_job_batch
#pbs -e Error_job_batch
/usr/local/bin/mpirun -np 3 --hostfile /home/jkumar/Job_test/machinefile hostname

$qsub script.sh
Different Syntax Format
#PBS -l nodes=1:ppn=2+1 (via Script)
select 1: it will choose the one node.
ncpus 2+1 : Submit 2 jobs in one node and 1 job in another node.
ncpus 3+ 2: Submit 3 jobs in one node and 2 jobs on another node.
NOTE: Advantage is we can fire the job depends on our requirement one node we can fire 2 job at same time another node we can fire 1 job.

Queue Attributes.
Link : http://dcwww.camd.dtu.dk/pbs.html (Refer Point 6)
Syntax :
create queue <Queue-Name>
set queue <Queue-Name> queue_type = Execution | Route
set queue <Queue-Name> Priority = 40
set queue <Queue-Name> resource_default.walltime = 72:00:00
set queue <Queue-Name> resource_default.nodes = 1
set queue <Queue-Name> resource_default.ncpus = 1
set queue <Queue-Name> enabled = 1
set queue <Queue-Name> started = 1
Note : Resource Attributes Keywords are
walltime ,nodes ,ncpus

PBS ERROR MESSAGE
qmgr -c "set server pbs_license_info=path of license file"
Qmgr: set queue new queue_type = Execution
Qmgr: set queue new resource_default.walltime = 72:00:00

qmgr obj=new svr=default: Undefined attribute
qmgr: Error (15002) returned from server
Qmgr: set queue new Priority = 40
Qmgr: set queue new resource_default.walltime = 72:00:00
qmgr obj=new svr=default: Undefined attribute
qmgr: Error (15002) returned from server
SOLUTION: We have to use the Exact Keyword for resource utilization. 
To Resolve The ISSUE
create queue new
set queue new queue_type = Execution
set queue new enabled = True
set queue new started = True
set queue new resources_max.ncpus = 24
set queue new resources_min.ncpus = 1
set queue new resources_min.walltime=00:20:00 (20 minute)
Then the problem will be resolved.
Useful Link:
http://www.hpc.cineca.it/content/batch-scheduler-pbs
http://www.cines.fr/spip.php?article593&lang=fr
http://hpc.sissa.it/pbs/pbs-4.html (Related to Job Scheduler Configuration.)

PBS Unresolved Question
1)What is Check Point In PBS.
2)What is PBS dataservice?
3)If the node is going down. How the job is running without any interruption.

0 comments:

Post a Comment