TORQUE Resource Manager and Maui
--
JoshuaWyattSmith - 19 Jan 2014
TORQUE
This describes how to put the resource manager TORQUE on a Raspberry Pi
AND Wandboard cluster. The following instructions are for the Master Node in the cluster.
First get a tar.gz package from adaptive computing and unzip it in the required place, or Download
torque-4.2.0-snap.201302040907.tar.gz.
As a root user do
-
./configure --with-default-server=your_server_name --with-server-home=/var/spool/pbs --with-rcp=scp
-
make
-
make install
You need to configure TORQUE:
The trqauthd daemon needs to be running. It is in contrib/init.d in the torque folder that was originally unzipped. Copy the required version (debian.trqauthd for ubuntu) into /etc/init.d/ and rename it trqauthd.
Further configuring is required:
-
qmgr -c "set server scheduling=true"
-
qmgr -c "create queue batch queue_type=execution"
-
qmgr -c "set queue batch started=true"
-
qmgr -c "set queue batch enabled=true"
-
qmgr -c "set queue batch resources_default.nodes=1"
-
qmgr -c "set queue batch resources_default.walltime=3600"
-
qmgr -c "set server default_queue=batch"=
There are further configurations you can do but see the manual for that.
As a test: (still as root)
should give something like
server: <server name>
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
batch -- -- -- -- 0 0 -- E R
----- -----
0 0
As a test you can submit a job to the queue:
should give something like
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.<user> STDIN <user> 0 Q batch
The job will not complete but will stay queued ("Q"). You need a scheduler. This is where Maui comes in, (next section).
Now add the worker nodes to the server/Master Node.
Create a file called "nodes" in /var/spool/pbs/server_priv and add the nodes hostnames. If they have more than one processor, add np=X next to the line. An example looks like this:
rpi1 np=4
rpi2 np=4
rpi3 np=4
rpi4 np=4
rpi5 np=4
Now we need to install TORQUE onto the worker nodes. In the initial unpacked tarball do
This gives a couple of executable in the format of torque-package-mom-linux-i686.sh. linux is just the architecture and will obviously vary.
Copy the "…mom…" executable to each of the nodes and execute with
-
./torque-package-mom-linux-i686.sh --install
The mom executable is the only one that is really needed so far. I'm not sure what the others are for yet.
Now create a file on each worker node in /var/spool/pbs/server_name which contains the hostname of the head (server) node.
Start the pbs_mom daemon for each worker node: (you'll probably need root user)
The equivalent for the server node is
sudo pbs_server
.
A
very useful tool to enable cluster management is "Cluster SSH" (need the gui).
-
sudo apt-get install clusterssh
As is "screen" if you disable the gui:
-
sudo apt-get install screen
Maui
I have installed Maui on the Raspberry Pi cluster,
not on the Wandboard Cluster. You can download it from the website or here
maui-3.3.1.tar.gz.
I don't remember anything difficult in this. A simple
./configure
,
make
and
make install
should work as root user and only Master (server node). To run it do
-
sudo /usr/local/maui/sbin/maui
Remember TORQUE is already running.
Repeating the
echo "sleep 30" | qsub
should work now. If you do
qstat
and
showq
you will find more information about the jobs that are running.