PBS has serveral subtype: OpenPBS, PBSPro and Torque, here we mainly refer to installation of torque

1. download torque from: http://www.adaptivecomputing.com/resources/downloads/torque/

2. tar zxvf torque-2.5.5.tar.gz

3. cd torque-2.5.5

4. ./configure --prefix=/opt/pbs

5. make

6. make install

7.  ./torque.setup liumh

met problem here:

[root@rocks4 torque-2.5.5]# ./torque.setup liumh

initializing TORQUE (admin: liumh@)

./torque.setup:line 31: pbs_server: command not found

./torque.setup:line 33: qmgr: command not found

ERROR:cannot set TORQUE admins

./torque.setup:line 37: qterm: command not found

7.1  We need to do

[root@rocks4 torque-2.5.5]# PATH=$PATH:/opt/pbs/bin:/opt/pbs/sbin

[root@rocks4 torque-2.5.5]# export PATH

[root@rocks4 torque-2.5.5]# MANPATH=$MANPATH:/opt/pbs/man

[root@rocks4 torque-2.5.5]# export MANPATH

7.2 [root@rocks4 torque-2.5.5]# ./torque.setup liumh

met problem again:

initializing TORQUE (admin: liumh@)

PBS_Server: LOG_ERROR::pbsd_main, unable to determine local server hostname - gethostbyname(rocks4)

failed, h_errno=1

Cannot resolve default server host 'rocks4' - check server_name file.

qmgr: cannot connect to server (errno=15010) Access from host not allowed, or unknown host

ERROR: cannot set TORQUE admins

Cannot resolve default server host 'rocks4' - check server_name file.

qterm: could not connect to server '' (15010) Access from host not allowed, or unknown host

7.3  We need to do:

[root@rocks4 torque-2.5.5]# vi /etc/hosts

! add the following line:

! 210.45.78.9 rocks4.lcg.ustc.edu.cn rocks4

7.4  [root@rocks4 torque-2.5.5]# ./torque.setup liumh

initializing TORQUE (admin: liumh@rocks4NOSPAMPLEASE.lcg.ustc.edu.cn)

Max open servers: 4

Max open servers: 4

8. [root@rocks4 torque-2.5.5]# make packages

9. install packages: 在 master 机器上需要安装的是 server 包,在节点上需要安装的是 mom 包。在需要提交 PBS 任务的机器上需要安装 clients 包

./torque-package-server-linux-x86_64.sh --install

./torque-package-mom-linux-x86_64.sh --install

./torque-package-clients-linux-x86_64.sh --install

Start to install worknodes

10. copy torque-package-mom-linux-x86_64.sh and torque-package-clients-linux-x86_64.sh to all work nodes

11. Take bl-3-1.local for an example:

11.1  ./torque-package-clients-linux-x86_64.sh --install

11.2  ./torque-package-mom-linux-x86_64.sh --install

11.3  libtool --finish /opt/pbs/lib

11.4  edit /etc/rc.local, add the following lines:

PATH=$PATH:/opt/pbs/bin:/opt/pbs/sbin

export PATH

MANPATH=$MANPATH:/opt/pbs/man

export MANPATH

11.5  perform the 4 commands in 11.4(PATH...)=

11.6  edit /var/spool/torque/servername, make sure it's "rocks4"

11.7  edit /var/spool/torque/mom_priv/config (new created), add the following lines:

pbsserver rocks4

logevent 255

11.8  run pbs_mom:

pbs_mom -c /var/spool/torque/mom_priv/config

12. configer server node:rocks4

12.1 edit /var/spool/torque/server_priv/nodes, add the following lines

bl-3-1.local np=8

...

12.2 edit /var/spool/torque/mom_priv/config (new created), add the following lines

pbsserver rocks4

logevent 255

12.3 run the following services:

pbs_mom -c /var/spool/torque/mom_priv/config

qterm -t quick

pbs_server

pbs_sched

12.4 add the services in 12.3 to start when the service is started:

vi /etc/rc.local

pbs_mom -c /var/spool/torque/mom_priv/config

pbs_server

pbs_sched

13. since the output of jobs will be returned back to the server with ssh, so we need to config ssh on all the work nodes

13.1 ssh-keygen -P "" -t rsa

13.2 eval `ssh-agent`

13.3 ssh-add /root/.ssh/id_rsa

13.4 put the information in /root/.ssh/id_rsa.pub into [/home/liumh/.ssh/authorized_keys@rocks4]

13.5 try command:

scp file liumh@rocks4NOSPAMPLEASE.lcg.ustc.edu.cn:/tmp

if file is copied without passwd, then your setup is successful

13.6 Attention, in 13.5, since bl-3-1.local can't identify rocks4.lcg.ustc.edu.cn, such errors may happen on bl-3-l:

see /var/log/message:

Feb 6 16:22:32 bl-3-1 pbs_mom: LOG_ERROR::req_cpyfile, Unable to copy file /var/sp ool/torque/spool/10.rocks4.lcg.ustc.edu.cn.OU to liumh@rocks4NOSPAMPLEASE.lcg.ustc.edu.cn:/home /liumh/test/pbs/pbsjob.o10

Feb 6 16:22:36 bl-3-1 pbs_mom: LOG_ERROR::sys_copy, command '/usr/bin/scp -rpB /va r/spool/torque/spool/10.rocks4.lcg.ustc.edu.cn.ER liumh@rocks4NOSPAMPLEASE.lcg.ustc.edu.cn:/hom e/liumh/test/pbs/pbsjob.e10' failed with status=1, giving up after 4 attempts

WHAT WE SHOULD DO IS ADD THE FOLLOWING LINES IN: /etc/hosts AT bl-3-1.local

10.1.1.11 rocks4.lcg.ustc.edu.cn rocks4

14.  Done

Troubleshooting:

Q1.in step 13, after adding content in id_rsa.pub(@bl-3-1.local) to authorized_keys of some users, the scp still need passwd, why?

A1. if the directory: /home/user or /home/user/.ssh has a bad permission, this problem will appear, you just need to perform: chmod 755 /home/user/.ssh

-- MinghuiLiu - 07-Feb-2012

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2012-02-07 - MinghuiLiu
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback