Run CMSSW code using the Condor batch queue of the LPCCAF at FNAL
Introduction
You can use the LPCCAF at FNAL to parallelize your processing of accessing samples available at FNAL. You can send jobs to the LPCCAF condor queue using your code from your
CMSSW project directory and write your output into your
execution directory.
- Processing batch jobs on the LPCCAF consists of the following components
- A user CMSSW project directory
- A user execution directory where all the output will be stored
- A tested parameter-set
- A script which is executed on each workernode (WN) which:
- sets up the CMS software environment
- sets up the user CMSSW project directory environment
- replaces necessary entries in the parameter-set to make each job unique and stores the new parameter-set in the execution directory
- executes cmsRun in the execution directory using the new parameter-set and stores stdout into log file
- A condor steering file (JDL) to submit jobs to the LPCCAF batch queues
Prerequisites
- All LPCCAAF workernodes (WN) have access to /uscms and /uscms_data/d1. Your CMSSW project directory and your execution directory have to be in one of the two.
- Make sure that your execution directory has enough free space. Check this using the
quota
command.
Prepare directories
- Use the CMSSW project directory from this tutorial as your CMSSW project directory
- Create your execution directory replacing
<user>
with your username:
mkdir /uscms_data/d1/<user>/batch_tutorial
Prepare parameter-set
- Use input dataset file
dataset.cff
from the DBS/DLS discovery create a parameter-set in your src
directory in your local user project area:
batch.cfg
and schedule the tutorial EDProducer in it. The following template CMSSW parameter-set is valid for <= CMSSW_1_4_X:
process P =
{
#
# load input file
#
source = PoolSource
{
untracked vstring fileNames = {"file:test.root"}
untracked int32 maxEvents = CONDOR_MAXEVENTS
untracked uint32 skipEvents = CONDOR_SKIPEVENTS
}
include "dataset.cff"
# include MyTrackUtility produces
module producer = MyTrackUtility
{
InputTag TrackProducerTag = ctfWithMaterialTracks
}
#
# write results out to file
#
module Out = PoolOutputModule
{
untracked string fileName = 'CONDOR_OUTPUTFILENAME'
}
path p =
{
producer
}
endpath e =
{
Out
}
}
while this following template CMSSW parameter-set is valid for >= CMSSW_1_5_X:
process P =
{
#
# max events steering
#
untracked PSet maxEvents =
{
untracked int32 input = CONDOR_MAXEVENTS
}
#
# load input file
#
source = PoolSource
{
untracked vstring fileNames = {"file:test.root"}
untracked uint32 skipEvents = CONDOR_SKIPEVENTS
}
include "dataset.cff"
# include MyTrackUtility produces
module producer = MyTrackUtility
{
InputTag TrackProducerTag = ctfWithMaterialTracks
}
#
# write results out to file
#
module Out = PoolOutputModule
{
untracked string fileName = 'CONDOR_OUTPUTFILENAME'
}
path p =
{
producer
}
endpath e =
{
Out
}
}
Prepare WN script
- Prepare WN script which has to be in the same directory as the JDL described in the following in the
src
directory of your local user project area named
condor.sh
with
#!/bin/bash
#
# variables from arguments string in jdl
#
# format:
#
# 1: condor cluster number
# 2: condor process number
# 3: CMSSW_DIR
# 4: RUN_DIR
# 5: PARAMETER_SET (full path, has to contain all needed files in PoolSource and filled following variables with keywords: maxEvents = CONDOR_MAXEVENTS, skipEvents = CONDOR_SKIPEVENTS, output fileName = CONDOR_OUTPUTFILENAME)
# 6: NUM_EVENTS_PER_JOB
#
CONDOR_CLUSTER=$1
CONDOR_PROCESS=$2
CMSSW_DIR=$3
RUN_DIR=$4
PARAMETER_SET=$5
NUM_EVENTS_PER_JOB=$6
#
# header
#
echo ""
echo "CMSSW on Condor"
echo ""
START_TIME=`/bin/date`
echo "started at $START_TIME"
echo ""
echo "parameter set:"
echo "CONDOR_CLUSTER: $CONDOR_CLUSTER"
echo "CONDOR_PROCESS: $CONDOR_PROCESS"
echo "CMSSW_DIR: $CMSSW_DIR"
echo "RUN_DIR: $RUN_DIR"
echo "PARAMETER_SET: $PARAMETER_SET"
echo "NUM_EVENTS_PER_JOB: $NUM_EVENTS_PER_JOB"
#
# setup software environment at FNAL for the given CMSSW release
#
source /uscmst1/prod/sw/cms/shrc uaf
export SCRAM_ARCH=slc4_ia32_gcc345
cd $CMSSW_DIR
eval `scramv1 runtime -sh`
#
# change to output directory
#
cd $RUN_DIR
#
# modify parameter-set
#
FINAL_PARAMETER_SET_NAME=`echo batch_${CONDOR_CLUSTER}_${CONDOR_PROCESS}`
FINAL_PARAMETER_SET=`echo $FINAL_PARAMETER_SET_NAME.cfg`
FINAL_LOG=`echo $FINAL_PARAMETER_SET_NAME.log`
FINAL_FILENAME=`echo $FINAL_PARAMETER_SET_NAME.root`
echo ""
echo "Writing final parameter-set: $FINAL_PARAMETER_SET to RUN_DIR: $RUN_DIR"
echo ""
let "skip = $CONDOR_PROCESS * NUM_EVENTS_PER_JOB"
cat $PARAMETER_SET | sed -e s/CONDOR_MAXEVENTS/$NUM_EVENTS_PER_JOB/ | sed -e s/CONDOR_SKIPEVENTS/$skip/ | sed -e s/CONDOR_OUTPUTFILENAME/$FINAL_FILENAME/ > $FINAL_PARAMETER_SET
#
# run cmssw
#
echo "run: time cmsRun $FINAL_PARAMETER_SET > $FINAL_LOG 2>&1"
cmsRun $FINAL_PARAMETER_SET >> $FINAL_LOG 2>&1
exitcode=$?
#
# end run
#
echo ""
END_TIME=`/bin/date`
echo "finished at $END_TIME"
exit $exitcode
Attention: this script is setup for SL4 releases ($ge; CMSSW_1_5_0). If you would like to use $le; CMSSW_1_4_X, please change the line:
export SCRAM_ARCH=slc4_ia32_gcc345
to
export SCRAM_ARCH=slc3_ia32_gcc323
- Change permissions to executable
chmod 755 condor.sh
Prepare JDL
- Prepare JDL in the local user project directory named
batch.jdl
- Change directory names to your setup
- Change how many events per job are processed by changing the variable in Arguments
- Change how many jobs are submitted by changing the variable Queue
universe = vanilla
Executable = condor.sh
should_transfer_files = NO
Output = <execution directory>/batch_$(cluster)_$(process).stdout
Error = <execution directory>/batch_$(cluster)_$(process).stderr
Log = <execution directory>/batch_$(cluster)_$(process).condor
Requirements = Memory >= 199 && OpSys == "LINUX" && (Arch != "DUMMY")
Arguments = $(cluster) $(process) <CMSSW project directory> <execution directory> <CMSSW project directory>/src/batch.cfg 10
Queue 10
Submission and status query
- Submit jobs in your local user project area:
condor_submit batch.jdl
condor_q -submitter $USER
Check your output
Finally, check the output of your jobs in your
execution directory