Links:

baobab

partition time limit max cores
debug 15:00 32
shared 12:00:00 400
bigmem 4-00:00:00 16
paralell 4-00:00:00 400

  • bigmem provides 256 Gb
  • mono-shared is the same s shared but allocation of resources is by core
  • mono is the same as parallel but allocation of resources is by core

slurm

Partitions

partition time limit
rhel6-veryshort 20:00
rhel6-short 3:00:00
rhel6-medium 9:00:00
rhel6-long 1-00:00:00
rhel6-verylong 9-14:00:00

"Time limit" is the maximum time limit for any user job in days-hours:minutes:seconds.

Commands

1.- display information

  • sinfo: shows information about Slurm nodes and partitions.
  • sinfo -l, --long: print more detailed information. This is ignored if the --format option is specified
  • sinfo --partition=rhel6-medium: shows detailed information of specific partition

2.- show job status

show pending jobs

squeue -u sevilla -t PD
Other job state options are:
  • all (all states)
  • PD (pending), R (running), CA (cancelled), CF(configuring), CG (completing), CD (completed), F (failed), TO (timeout), NF (node failure), RV (revoked) and SE (special exit state)

show completed jobs

squeue -u sevilla -t CD

show detailed information about job

scontrol show jobid -dd <JOBID>

sevilla@atlas074.unige.ch :/atlas/users/sevilla/pixel/dq/batch/out/340973/merged$ scontrol show jobid -dd 75496
JobId=75496 JobName=scripts/reco_slurm_340973_cbadModulesAll_180
   UserId=sevilla(16607) GroupId=atlas(1307) MCS_label=N/A
   Priority=4294842851 Nice=0 Account=(null) QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=09:00:00 TimeMin=N/A
   SubmitTime=2018-02-19T20:50:11 EligibleTime=2018-02-19T20:50:11
   StartTime=2018-02-20T02:20:00 EndTime=2018-02-20T11:20:00 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2018-02-19T23:02:22
   Partition=rhel6-medium AllocNode:Sid=atlas074:54976
   ReqNodeList=(null) ExcNodeList=atlas008
   NodeList=(null) SchedNodeList=atlas050
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=2000M,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/atlas/users/sevilla/pixel/dq/batch/scripts/reco_slurm.sh
   WorkDir=/atlas/users/sevilla/pixel/dq/batch
   StdErr=/atlas/users/sevilla/pixel/dq/batch/logs/340973/340973_cbadModulesAll_180.err
   StdIn=/dev/null
   StdOut=/atlas/users/sevilla/pixel/dq/batch/logs/340973/340973_cbadModulesAll_180.log
   Power=

3.- cancel jobs

  • using jobId
scancel <jobid>
  • cancel all jobs of a user
scancel -u <username>
  • cancel all pending jobs for a user
scancel -t PENDING -u <username>

User interfaces (login nodes)

Node Architecture Processors Cores RAM [Gb] CPU limit time usage
atlas013 Intel 8 4 15 30 min edition, send jobs to batch nodes
atlas014 Intel 8 4 15 30 min edition, send jobs to batch nodes
atlas074 AMD 64 8 189 no interactive jobs

TIP less /proc/cpuinfo provides details about individual cpu cores. Every processor or core is listed separately the various details about speed, cache size and model name are included in the description.

  • To count the number of processing units use grep with wc: cat /proc/cpuinfo | grep processor | wc -l.

ALERT! The number of processors shown by /proc/cpuinfo might not be the actual number of cores on the processor. For example a processor with 2 cores and hyperthreading would be reported as a processor with 4 cores.

  • To get the actual number of cores, check the core id for unique values using from cat /proc/cpuinfo | grep 'core id'.

TIP To check the amount of RAM, one can use less /proc/meminfo or free -g.

NFS file servers, file systems

Summary:
  • /atlas/users/
    • 2 TB total space
    • 100 GB limit / user
    • backed-up every night
  • /atlas/datax
    • /atlas/data1 and /atlas/data2: 16 TB total space
    • /atlas/data3 and /atlas/data4: 32 TB total space
    • not backed-up

  • /atlas/data2/userdata/sevilla
    • trigger/L2CaloCalibration stuff

Useful

using XRootD

setupATLAS
lsetup xrootd

  • To list the contents of /eos/user/s/sevilla (CERNBOX Storage): xrd eosuser.cern.ch dirlist /eos/user/s/sevilla
  • To list the contents of /eos/atlas/user/s/sevilla (ATLAS Storage): xrd eosatlas.cern.ch dirlist /eos/atlas/user/s/sevilla

  • To copy file (from CERNBox) to local directory: xrdcp root://eosuser.cern.ch/{remote_file_path} {local_directory_path}
  • To copy file (from eosatlas) to local directory: xrdcp root://eosatlas.cern.ch/{remote_file_path} {local_directory_path}

Download dataset

setupATLAS
localSetupRucioClients
voms-proxy-init -voms atlas
rucio list-dids data17_13TeV:*333650.express_express*RAW*
rucio download data17_13TeV:data17_13TeV.00333650.express_express.merge.RAW 

TIP alias sr='source /atlas/users/sevilla/scripts/setup_rucio.sh'

List files in dataset After having setup rucio:

rucio list-files data17_13TeV:data17_13TeV.00339590.express_express.merge.RAW

Remove all jobs:

for job in $(qstat | grep $USER | grep -v " C " | tr ' ' '\n' | grep grid); do qdel $job; done

Change file / dir permissions

chmod -R g+r dir

Useful scripts

  • /atlas/software/batch-tools/get-job-info.py
  • /atlas/software/batch-tools/get-job-meminfo.py

Deprecated

Read data with RFIO

rfdir /dpm/unige.ch/home/atlas/atlaslocalgroupdisk/mc11_7TeV/NTUP_TOP/

Setup ROOT

# Setup the gcc version. 
source /afs/cern.ch/sw/lcg/external/gcc/4.3.3/i686-slc5-gcc43-opt/setup.sh

# Set up ROOT (5.30, 32 bit)
cd /afs/cern.ch/sw/lcg/app/releases/ROOT/5.30.06/i686-slc5-gcc43-opt/root
source bin/thisroot.sh
cd - >/dev/null && echo "# ROOTSYS = ${ROOTSYS}"

# 32 bit python for 32 bit ROOT
python_dir="/afs/cern.ch/sw/lcg/external/Python/2.6.5/i686-slc5-gcc43-opt"
if [[ $PATH != *"$python_dir"* ]]; then
  export PATH="$python_dir/bin:$PATH"
fi

if [[ *"$LD_LIBRARY_PATH"* != $python_dir ]]; then
  export LD_LIBRARY_PATH="$python_dir/lib:$LD_LIBRARY_PATH"
fi

# my Python Modules search path
if [ $PYTHONPATH ]; then
    export PYTHONPATH=/atlas/users/sevilla/scripts:${PYTHONPATH}
else
    export PYTHONPATH=/atlas/users/sevilla/scripts
fi
# Setup the gcc version. 
source /afs/cern.ch/sw/lcg/external/gcc/4.3.3/x86_64-slc5-gcc43-opt/setup.sh

# Set up ROOT (5.30, 64 bit)
cd /afs/cern.ch/sw/lcg/app/releases/ROOT/5.30.00/x86_64-slc5-gcc43-opt/root
source bin/thisroot.sh
cd - >/dev/null && echo "# ROOTSYS = ${ROOTSYS}"

# 64 bit python for 64 bit ROOT
python_dir="/afs/cern.ch/sw/lcg/external/Python/2.6.5/x86_64-slc5-gcc43-opt"
if [[ $PATH != *"$python_dir"* ]]; then
  export PATH="$python_dir/bin:$PATH"
fi

if [[ *"$LD_LIBRARY_PATH"* != $python_dir ]]; then
  export LD_LIBRARY_PATH="$python_dir/lib:$LD_LIBRARY_PATH"
fi

# my Python Modules search path
if [ $PYTHONPATH ]; then
    export PYTHONPATH=/atlas/users/sevilla/scripts:${PYTHONPATH}
else
    export PYTHONPATH=/atlas/users/sevilla/scripts
fi

Setup Storage Element

# This script should be sourced rather than run.

# setup dq2
echo "source /atlas/software/dpm-test/env-setup.sh"
source /atlas/software/dpm-test/env-setup.sh
export LD_LIBRARY_PATH=/atlas/software/dpm/3.2.10-1/lib64:${LD_LIBRARY_PATH} # for libdpm.so

# check for a valid proxy
echo "voms-proxy-info"
voms-proxy-info

# create a proxy if one does not exist
if [[ $? != 0 ]]; then
  good_proxy=0

  while [[ $good_proxy == 0 ]]; do
    echo "voms-proxy-init -voms atlas -valid 90:00"
    voms-proxy-init -voms atlas -valid 90:00
    if [[ $? == 0 ]]; then
      good_proxy=1
    fi
  done
fi
# This script should be sourced rather than run.

# setup dq2
echo "source /atlas/software/dpm-test/env-setup.sh"
source /atlas/software/dpm-test/env-setup.sh
export LD_LIBRARY_PATH=/atlas/software/dpm/3.2.10-1/lib:${LD_LIBRARY_PATH} # for libdpm.so

# check for a valid proxy
echo "voms-proxy-info"
voms-proxy-info

# create a proxy if one does not exist
if [[ $? != 0 ]]; then
  good_proxy=0

  while [[ $good_proxy == 0 ]]; do
    echo "voms-proxy-init -voms atlas -valid 90:00"
    voms-proxy-init -voms atlas -valid 90:00
    if [[ $? == 0 ]]; then
      good_proxy=1
    fi
  done
fi

-- SergioGonzalez - 09-Feb-2012

Edit | Attach | Watch | Print version | History: r30 < r29 < r28 < r27 < r26 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r30 - 2018-04-11 - SergioGonzalez
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback