CU eScience: Instructions for administrators
Preparing host
OS
- CERN SLC6 (base on RHEL6): Link
- Centos 7: Link
- Will migrate to CERN Centos 7 as soon as image is available.
Network
We normally connect host server with 2 networks,
- 10.42.43.X: This is a private network for CUniverse/PPRL project.
- 10.42.44.X: This is a private network for national eScience project.
Steps:
(1) Disable NetworkManager
service NetworkManager stop
chkconfig NetworkManager off
(2) Edit network configuration scripts, ensure that
-
NM_CONTROLLED
configuration key is set to no
-
ON_BOOT
configuration key is set to yes
.
- For 10.42.43.X, we use DHCP.
-
/etc/sysconfig/network-scripts/ifcfg-eth3
DEVICE=eth3
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=dhcp
- For 10.42.44.X, we use network bridge
-
/etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.42.44.9
NETMASK=255.255.255.0
TYPE=Bridge
-
-
/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
BRIDGE=br0
- Edit
/etc/hosts
of 10.42.44.1
10.42.44.9 host-wn07
(3) Ensure that the network service is enabled, and restart (start) network service
chkconfig network on
service network start
Preparing images & create VM
(0) Images of worknode can be found at
/programs/storevm/
.
- Centos6:
cu-wnescience-centos6.img
(300 GB)
(1) Copy image to
/var/lib/libvirt/images/
, rename it as you want.
(2) Create VM
virt-install -n cu-0-7.local --arch=x86_64 --machine=rhel6.6.0 --os-type=Linux --os-variant=rhel6 --ram=61440 --vcpus=20 --disk path=/var/lib/libvirt/images/cu-0-7.img --graphic vnc --network bridge:br0 --boot=hd
(3) You may need to fix network setting
- I am not sure how to connect to VM with a command line. I currently stay in front of a machine, and access to VM using
virt-manager
.
- Fix network configurations
-
/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.226.63.8
NETMASK=255.255.255.0
TYPE=Ethernet
MTU=1500
-
-
/etc/sysconfig/network-scripts/ifcfg-eth0:0
DEVICE=eth0:0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.42.44.58
NETMASK=255.255.255.0
TYPE=Ethernet
MTU=1500
10.226.63.8 cu-0-7.local cu-0-7
(4) Edit hostname in
/etc/sysconfig/network
(5) Delete
/etc/udev/rules.d/70_persistent-net.rules
- Then restart the VM, this file will be recreated again with
eth0
only
How to control/modify VM
Connect to the host machine, and use
virsh
command, i.e.
Network
Host |
No. of CPUs |
Memory (GB) |
VM IP |
Note |
10.42.44.X |
10.226.63.X |
10.42.44.X |
221 |
16 |
30 |
1 |
50 |
Mathematica, base |
222 |
16 |
30 |
2 |
51 |
Mathematica, base |
223 |
16 |
30 |
3 |
52 |
base |
224 |
16 |
30 |
4 |
53 |
base |
225 |
16 |
30 |
5 |
54 |
base |
238 |
20 |
64 |
6 |
55 |
base |
237 |
20 |
64 |
7 |
56 |
base |
9 |
32 |
32 |
8 |
58 |
base |
241 |
2 |
7 |
9 |
57 |
base |
How to add worknode to frontend
(1) Edit
/etc/hosts
of the frontend
10.226.63.8 cu-0-7.local cu-0-7
(2) Edit
/etc/c3.conf
of the frontend, i.e.
cluster esci-cu {
esci-cu #head node
cu-0-0 #compute nodes
cu-0-1 #compute nodes
cu-0-2 #compute nodes
cu-0-3 #compute nodes
cu-0-4 #compute nodes
cu-0-5 #compute nodes
cu-0-6 #compute nodes
cu-0-7 #compute nodes
cu-0-8 #compute nodes
}
(3) Add new compute node to Torque/PBS,
qmgr -c "create node cu-0-7.local np = 32"
qmgr -c "set node cu-0-7.local properties=base"
One can play with node/queue properties using qmgr.
(4) Use
pbsnodes
commands, i.e.
-
pbsnodes -a
to list all attributes of all nodes.
-
pbsnodes -o cu-0-7.local
to add the OFFLINE state to the listed nodes.
-
pbsnodes -c cu-0-7.local
to clear the OFFLINE state from the listed nodes.
- pbsnodes command
Queue management
List job detail
Use
qstat -a
, or
pbsnodes
Change the walltime
qalter -l walltime=500:00:00 Job_ID
Kill job
Use
qdel [jobid]
or
qdel -p [jobid]
to force purge the job if it is not killed by qdel
Queue and machine
Edit
nodes
files in
/opt/torque-2.5.13/server_priv/
, the restart the pbs_server service:
/sbin/service pbs_server restart
Power
UPS (up-1)
Order (up to down) |
Machine |
Note |
1 |
- |
- |
2 |
- |
- |
3 |
- |
- |
4 |
IBM TS2900 |
Tape backup, single power supply |
5 |
Storwize v3700 (main) |
Main storage |
6 |
Cisco Catalyst 2960-S |
eScience 161.200.116.X network, single power supply |
7 |
Black outlet extension |
- |
UPS (down-2)
Broken. Bypass option is currently used.
APC UPS
Black outlet extension
UPS (up-1) electricity
Order (up to down) |
Machine |
Note |
1 |
- |
- |
2 |
- |
- |
3 |
- |
- |
4 |
- |
- |
5 |
IBM x3550 M4 Machine-1 |
VMs of eScience UI |
6 |
Juniper ex2200 |
eScience internal network |
7 |
Dell PowerEdge R630 (down) |
- |
8 |
Dell PowerEdge R630 (up) |
- |
9 |
Monitor |
Use with KVM |
10 |
Zyxel GS1910-24 (down) |
CUniverse network |
11 |
Zyxel GS1910-24 (up) |
CUniverse network |
12 |
HP 1410-16G |
Main switch of department network |
White outlet extension
MHMK electricity
Tips
User information
finger [username]
--
PhatSrimanobhas - 2016-09-08