Introduction
CoEPPGridTools is a set of python tools, which provides
D3PD skimming and also facilitates submission of grid jobs and manipulation of dq2 datasets.
Layout:
-
src
- classes that take care of grid submission
-
share
- python scripts designed to be modified for custom use (skimmers live here)
-
scripts
- command line tools (added to path in setup.sh)
-
Selectors
- selector classes used in higher level skims (eg. skim_Ztautau)
Latest News
Installation
Follow these steps to install:
- Follow the instructions to setup the CoEPP suite here
- In the CoEPPDir checkout the LATEST tag of CoEPPGridTools (you should check which is latest in SVN):
-
svn co svn+ssh://svn.cern.ch/reps/atlasusr/wdavey/CoEPPGridTools/tags/CoEPPGridTools-XX-XX-XX CoEPPGridTools
- Setup the package (you should do this everytime you start a new shell)
Skimming Example
- Open a fresh shell
- Setup athena (release is not too important, just need a version of root compiled against python) (if you havent done this before look here)
- Setup your voms proxy, and the panda tools. (If you dont know how to do this, check 'Starting on the Grid', PandaRun and Panda Setup)
- Go to the CoEPPGridTools directory
- Source the setup, make a run directory and copy over a skimmer script
source setup.sh
mkdir run
cd run
cp ../share/skim_r17.py .
- You can edit the configuration in the header of the skimmer
- Launch the skimmer with the following command:
launchSkimmer -s Ztautau -t r17default -u <user name> --test 1 --postfix="TestSkim" skim_r17.py
This will launch the 'skim_r17.py' skimmer over the Ztautau sample for the default r17 production (configured in
src/Samples.py
). The
--test 1
flag will send a single file job to the express stream, which should come back quickly (and will append 't1' to the dataset name). The
-u
should be replaced with your user name registered in the Panda Database. Finally
--postfix
will be added to the name of the dataset. A full set of instructions can be obtained using
launchSkimmer --help
. There are two main modes of configuration, either with the sample (
-s
) and tag (
-t
) flags (which load samples from the preconfigured database in Samples.py),
or manually with
--inDS
and
--outDS
. Note that the first option will take care of a lot of the naming of the output dataset for you. To save yourself from specifying your username each time, you can set the default user name
gUSER='wdavey'
in
src/BaseModule.py
.
Checking the output
The output can be checked in a number of ways:
- Using pbook
- just type
pbook
in the shell
- Check the Panda Monitor
- Get your panda job id ( printed when you launch 'PandaID=XXXX' or found in pbook)
- Go to the Panda Production Dashboard
- Put your job ID into the 'Panda Job ID' field on the left tab.
Big Skimming Jobs
Organisation is key when performing a large scale skimming job. Typically the input data
D3PDs are stored in separate datasets for each run. You will probably want to create your own containers for each period. The script
share/createContainers.py
is designed to sort the runs into periods and create new containers. It is very important that before running you set the configuration for your personal use correctly! You can find specific instructions regarding container creation
here. You should also minimise the number of sites that you send your skimming jobs to. If you don't specify a site, you will spend weeks trying to track down every last file that got split over the grid. Luckily, the tau data
D3PDs should be stored at one of four sites:
DESY-HH_PERF-TAU
,
UNI-FREIBURG_PERF-TAU
,
MWT2_UC_PERF-TAU
,
TRIUMF-LCG2_PERF-TAU
. You can check the location of a dataset using
dq2-ls -r
. Then use the
--site
flag to specify an appropriate site.
Command Line Tools
CoEPPGridTools has a set of command line tools in the
CoEPPGridTools/scripts
directory. You can get usage info on each tool with the
--help
flag.
mergeLumiTarballs
- for calculating the lumi of skimmed data D3PDs
Skims produced with the
CoEPPGridTools skimmers save the lumi info about each input
D3PD in an xml file
lumi_N.xml
. All lumi xml files are put into a tarball at the end of the job (should be something like
user.wdavey.002781._00163.lumi_XYZ.xml.tgz
), and will be downloaded with the skimmed ntuples when you download the dataset. To calculate the lumi of your skimmed dataset, follow these instructions:
- Merge all the xml files for the dataset into one xml file. To do this execute the following command:
- Take the overlap of this merged xml with your GRL using the default GoodRunsLists shipped with recent athena releases. To do this, make sure you have setup athena, then execute the following command:
- Calculate the lumi.
- Go to the lumi wiki here
- upload the overlapped xml
- set your EF level trigger (needed incase its prescaled)
- if you want to retrieve the distribution for reweighting specify the option
--plots
in the options field.
Note: If you now want to do this procedure for each period B-M separately in your analysis, there is a script:
CoEPPNtupGen/share/genLumi.sh
in
CoEPPNtupGen to do this.
Ztautau Skimmer
The Ztautau skimmers are higher level skimmers that apply event selection on to of object selection. The idea is to only write out events
that pass the standard Ztautau selection. Two versions of the Ztautau skimmer exist:
-
skimZtautau_r17.py
- writes out the standard set of branches
-
eventCounterZtautau_r17.py
writes out just the RunNumber and EventNumber branches.
eventCounterZtautau_r17.py
is designed to be used to obtain event lists that can then be input to (eg.) athena jobs.
Two additional branches are added to the output tree:
Running Ztautau skimmers
The Ztautau skimmers must be run for the head directory of the package. From here execute these commands:
- copy over the desired skimmer:
- run the skimmer:
Note: the location of the
setup.sh
script must be specified so that the python classes can be used on the grid.
--
WillDavey - 20-Nov-2011