CU eScience: Instructions for administrators

Preparing host

OS

  • CERN SLC6 (base on RHEL6): Link
  • Centos 7: Link
    • Will migrate to CERN Centos 7 as soon as image is available.

Network

We normally connect host server with 2 networks,
  • 10.42.43.X: This is a private network for CUniverse/PPRL project.
  • 10.42.44.X: This is a private network for national eScience project.
Steps:

(1) Disable NetworkManager

service NetworkManager stop
chkconfig NetworkManager off

(2) Edit network configuration scripts, ensure that

  • NM_CONTROLLED configuration key is set to no
  • ON_BOOT configuration key is set to yes.
  • For 10.42.43.X, we use DHCP.
    • /etc/sysconfig/network-scripts/ifcfg-eth3
DEVICE=eth3
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=dhcp
  • For 10.42.44.X, we use network bridge
    • /etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.42.44.9
NETMASK=255.255.255.0
TYPE=Bridge
    • /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
BRIDGE=br0
  • Edit /etc/hosts of 10.42.44.1
    • Add, i.e.
10.42.44.9      host-wn07

(3) Ensure that the network service is enabled, and restart (start) network service

chkconfig network on
service network start

Preparing images & create VM

(0) Images of worknode can be found at /programs/storevm/.
  • Centos6: cu-wnescience-centos6.img (300 GB)

(1) Copy image to /var/lib/libvirt/images/, rename it as you want.

(2) Create VM

virt-install -n cu-0-7.local --arch=x86_64 --machine=rhel6.6.0 --os-type=Linux --os-variant=rhel6 --ram=61440 --vcpus=20 --disk path=/var/lib/libvirt/images/cu-0-7.img --graphic vnc --network bridge:br0 --boot=hd

(3) You may need to fix network setting

  • I am not sure how to connect to VM with a command line. I currently stay in front of a machine, and access to VM using virt-manager.
  • Fix network configurations
    • /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.226.63.8
NETMASK=255.255.255.0
TYPE=Ethernet
MTU=1500
    • /etc/sysconfig/network-scripts/ifcfg-eth0:0
DEVICE=eth0:0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.42.44.58
NETMASK=255.255.255.0
TYPE=Ethernet
MTU=1500
  • /etc/hosts, i.e. add
10.226.63.8     cu-0-7.local    cu-0-7

(4) Edit hostname in /etc/sysconfig/network

(5) Delete /etc/udev/rules.d/70_persistent-net.rules

  • Then restart the VM, this file will be recreated again with eth0 only

How to control/modify VM

Connect to the host machine, and use virsh command, i.e.

Network

Host No. of CPUs Memory (GB) VM IP Note
10.42.44.X 10.226.63.X 10.42.44.X
221 16 30 1 50 Mathematica, base
222 16 30 2 51 Mathematica, base
223 16 30 3 52 base
224 16 30 4 53 base
225 16 30 5 54 base
238 20 64 6 55 base
237 20 64 7 56 base
9 32 32 8 58 base
241 2 7 9 57 base

How to add worknode to frontend

(1) Edit /etc/hosts of the frontend
    • Add, i.e.
10.226.63.8     cu-0-7.local    cu-0-7

(2) Edit /etc/c3.conf of the frontend, i.e.

cluster esci-cu {
   esci-cu #head node
   cu-0-0 #compute nodes
   cu-0-1 #compute nodes
   cu-0-2 #compute nodes
   cu-0-3 #compute nodes 
   cu-0-4 #compute nodes
   cu-0-5 #compute nodes
   cu-0-6 #compute nodes
   cu-0-7 #compute nodes
   cu-0-8 #compute nodes
}

(3) Add new compute node to Torque/PBS,

qmgr -c "create node cu-0-7.local np = 32"
qmgr -c "set node cu-0-7.local properties=base"
One can play with node/queue properties using qmgr.

(4) Use pbsnodes commands, i.e.

  • pbsnodes -a to list all attributes of all nodes.
  • pbsnodes -o cu-0-7.local to add the OFFLINE state to the listed nodes.
  • pbsnodes -c cu-0-7.local to clear the OFFLINE state from the listed nodes.
  • pbsnodes command

Queue management

List job detail

Use qstat -a, or pbsnodes

Change the walltime

qalter -l walltime=500:00:00 Job_ID

Kill job

Use qdel [jobid] or qdel -p [jobid] to force purge the job if it is not killed by qdel

Queue and machine

Edit nodes files in /opt/torque-2.5.13/server_priv/, the restart the pbs_server service:
/sbin/service pbs_server restart

Power

UPS (up-1)

Order (up to down) Machine Note
1 - -
2 - -
3 - -
4 IBM TS2900 Tape backup, single power supply
5 Storwize v3700 (main) Main storage
6 Cisco Catalyst 2960-S eScience 161.200.116.X network, single power supply
7 Black outlet extension -

UPS (down-2)

Broken. Bypass option is currently used.
Order (up to down) Machine Note
1 - -
2 Storwize v3700 (main) Main storage
3 - -
4 - -
5 - -
6 KVM KVM without monitor
7 HP ProLiant DL360 Gen9 CUniverse Main Server

APC UPS

Order (left to right) 1 2 3 4
  Finger scan QNAP - Fan

Black outlet extension

UPS (up-1) electricity
Order (up to down) Machine Note
1 - -
2 - -
3 - -
4 - -
5 IBM x3550 M4 Machine-1 VMs of eScience UI
6 Juniper ex2200 eScience internal network
7 Dell PowerEdge R630 (down) -
8 Dell PowerEdge R630 (up) -
9 Monitor Use with KVM
10 Zyxel GS1910-24 (down) CUniverse network
11 Zyxel GS1910-24 (up) CUniverse network
12 HP 1410-16G Main switch of department network

White outlet extension

MHMK electricity
Order (up to down) Machine Note
1 - -
2 - -
3 - -
4 - -
5 Dell PowerEdge R630 (down) -
6 Dell PowerEdge R630 (up) -
7 - -
8 IBM x3550 M4 Machine-1 -
9 - -
10 - -
11 - -
12 APC UPS -

Tips

User information

finger [username]

-- PhatSrimanobhas - 2016-09-08

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2017-09-03 - PhatSrimanobhas
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback