TORQUE Resource Manager and Maui

-- JoshuaWyattSmith - 19 Jan 2014

TORQUE

This describes how to put the resource manager TORQUE on a Raspberry Pi AND Wandboard cluster. The following instructions are for the Master Node in the cluster.

First get a tar.gz package from adaptive computing and unzip it in the required place, or Download torque-4.2.0-snap.201302040907.tar.gz.

As a root user do

  • ./configure --with-default-server=your_server_name --with-server-home=/var/spool/pbs --with-rcp=scp
  • make
  • make install

You need to configure TORQUE:

  • pbs_server -t create

The trqauthd daemon needs to be running. It is in contrib/init.d in the torque folder that was originally unzipped. Copy the required version (debian.trqauthd for ubuntu) into /etc/init.d/ and rename it trqauthd. Further configuring is required:

  • qmgr -c "set server scheduling=true"
  • qmgr -c "create queue batch queue_type=execution"
  • qmgr -c "set queue batch started=true"
  • qmgr -c "set queue batch enabled=true"
  • qmgr -c "set queue batch resources_default.nodes=1"
  • qmgr -c "set queue batch resources_default.walltime=3600"
  • qmgr -c "set server default_queue=batch"=

There are further configurations you can do but see the manual for that. As a test: (still as root)

  • qstat -q

should give something like

server: <server name>

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
batch              --      --       --      --    0   0 --   E R
                                               ----- -----
                                                   0     0

As a test you can submit a job to the queue:

  • echo "sleep 30" | qsub

should give something like


Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.<user>                  STDIN            <user>                 0 Q batch

The job will not complete but will stay queued ("Q"). You need a scheduler. This is where Maui comes in, (next section).

Now add the worker nodes to the server/Master Node. Create a file called "nodes" in /var/spool/pbs/server_priv and add the nodes hostnames. If they have more than one processor, add np=X next to the line. An example looks like this:

rpi1 np=4
rpi2 np=4
rpi3 np=4
rpi4 np=4
rpi5 np=4

Now we need to install TORQUE onto the worker nodes. In the initial unpacked tarball do

  • make packages

This gives a couple of executable in the format of torque-package-mom-linux-i686.sh. linux is just the architecture and will obviously vary.

Copy the "…mom…" executable to each of the nodes and execute with

  • ./torque-package-mom-linux-i686.sh --install

The mom executable is the only one that is really needed so far. I'm not sure what the others are for yet.

Now create a file on each worker node in /var/spool/pbs/server_name which contains the hostname of the head (server) node.

Start the pbs_mom daemon for each worker node: (you'll probably need root user)

  • sudo pbs_mom

The equivalent for the server node is

 sudo pbs_server 
.

A very useful tool to enable cluster management is "Cluster SSH" (need the gui).

  • sudo apt-get install clusterssh

As is "screen" if you disable the gui:

  • sudo apt-get install screen

Maui

I have installed Maui on the Raspberry Pi cluster, not on the Wandboard Cluster. You can download it from the website or here maui-3.3.1.tar.gz.

I don't remember anything difficult in this. A simple ./configure, make and make install should work as root user and only Master (server node). To run it do

  • sudo /usr/local/maui/sbin/maui

Remember TORQUE is already running.

Repeating the echo "sleep 30" | qsub should work now. If you do qstat and showq you will find more information about the jobs that are running.

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatgz maui-3.3.1.tar.gz r1 manage 880.1 K 2014-01-20 - 10:21 JoshuaWyattSmith Maui-3.3.1
Unknown file formatgz torque-4.2.0-snap.201302040907.tar.gz r1 manage 6068.1 K 2014-01-20 - 10:20 JoshuaWyattSmith Torque-4.2.0
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2014-01-20 - JoshuaWyattSmith
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback