ARC site Installation for ALICE (and not ALICE) and WLCG
Requirements
- Standard requirements for WLCG and ALICE site
- CentOS 7 for head node and worker nodes
- Shared disk space between worker nodes and main server
- Torque Batch System (or another LRMS, but torque is used as an example here)
Links to ARC manuals:
Welcome to ARC Version 6!
ARC Data Services Technical Description
ARC Configuration Reference Document
Packages
Nordugrid 6.5.0
Package installation
For ARC-CE
# yum -y install epel-release https://centos7.iuscommunity.org/ius-release.rpm https://download.nordugrid.org/packages/nordugrid-release/releases/6/centos/el7/x86_64/nordugrid-release-6-1.el7.noarch.rpm http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/umd-release-4.1.3-1.el7.centos.noarch.rpm
# yum install http://linuxsoft.cern.ch/wlcg/centos7/x86_64/wlcg-repo-1.0.0-1.el7.noarch.rpm -y
# yum -y install nordugrid-arc6-compute-element nordugrid-arc6-arex nordugrid-arc6-plugins-gridftpjob ca-policy-egi-core wlcg-voms-alice wlcg-voms-atlas wlcg-voms-ops nordugrid-arc6-plugins-lcas-lcmaps nordugrid-arc6-plugins-globus nordugrid-arc6-gridftpd nordugrid-arc6-arcctl
For Site-BDII
# yum install bdii-config-site.noarch bdii
Additional patches
# diff -u /usr/share/arc/submit-pbs-job.save /usr/share/arc/submit-pbs-job
--- /usr/share/arc/submit-pbs-job.save 2020-04-28 01:20:43.012725899 +0300
+++ /usr/share/arc/submit-pbs-job 2020-04-28 01:21:44.942806477 +0300
@@ -217,7 +217,7 @@
fi
if [ ! -z "$memreq" ] ; then
- echo "#PBS -l vmem=${memreq}mb" >> $LRMS_JOB_SCRIPT
+ echo "#PBS -l mem=${memreq}mb" >> $LRMS_JOB_SCRIPT
fi
gate_host=`uname -n`
- Torque log scan filter (pull request submitted upstream)
In the file /usr/share/arc/scan-pbs-job check this line:
exited_killed_jobs=`egrep '^[^;]*;0010;[^;]*;Job;|^[^;]*;0008;[^;]*;Job;[^;]*;Exit_status=|^[^;]*;0008;[^;]*;Job;[^;]*;Job deleted' ${lname} | tail -n+$(( $lines_skip + 1 ))`
And change it to:
exited_killed_jobs=`egrep '^[^;]*;(0010|16);[^;]*;Job;|^[^;]*;(00)?08;[^;]*;Job;[^;]*;Exit_status=|^[^;]*;(00)?08;[^;]*;Job;[^;]*;Job deleted' ${lname} | tail -n+$(( $lines_skip + 1 ))`
- Patch for the Glue Schema generator needed for proper jobslots and voviews publishing.
Patch for /usr/share/arc/glue-generator.pl
ARC.conf
example /etc/arc.conf
Shared folders
- shared_scratch
- sessiondir
- cachedir
Define ARC accounts pool:
vo=alice
mkdir -p /etc/grid-security/pool/$vo
for u in ${vo}{001..50}; do echo $u >> /etc/grid-security/pool/$vo/pool; done
Configure and start services (ARC-CE)
arcctl service enable -a
arcctl rte enable ENV/PROXY
arcctl rte enable -d ENV/GLITE
arcctl service start -a
systemctl start fetch-crl-cron
For ATLAS only:
arcctl rte enable -d APPS/HEP/ATLAS-SITE-LCG
Validate your configuration:
arcctl config verify
NOTE: You MUST define benchmark values in both [queue:*] and [infosys/cluster] sections otherwise they will not be published properly. Please ignore validator warning "['benchmark' is not a valid option in [infosys/cluster]]"
Configure site-BDII (example files)
/etc/bdii/gip/glite-info-site-defaults.conf
/etc/bdii/gip/site-urls.conf
/etc/glite-info-static/site/site.cfg
Start BDII service
systemctl enable bdii
systemctl restart bdii
Firewall
- Main server
- 2811/tcp, 2170/tcp, 2135/tcp, 6443/tcp, 9000-12000/tcp
Troubleshooting
Fast problem:
- RTE - need check RTE for your VO
- LRMS limit - check limit (cputime, memory etc.) in your LRMS system
- CVMFS - check cvmfs on nodes
- Permission and access for work folders (on nodes and server).
- Open ports for gridftp (9000-12000/tcp as example)
--
AndreyZarochentsev - 2020-07-06