CoEPPGridTools

Introduction

CoEPPGridTools is a set of python tools, which provides D3PD skimming and also facilitates submission of grid jobs and manipulation of dq2 datasets.

Layout:

  • src - classes that take care of grid submission
  • share - python scripts designed to be modified for custom use (skimmers live here)
  • scripts - command line tools (added to path in setup.sh)
  • Selectors - selector classes used in higher level skims (eg. skim_Ztautau)

Latest News

Installation

Follow these steps to install:
  1. Follow the instructions to setup the CoEPP suite here
  2. In the CoEPPDir checkout the LATEST tag of CoEPPGridTools (you should check which is latest in SVN):
    • svn co svn+ssh://svn.cern.ch/reps/atlasusr/wdavey/CoEPPGridTools/tags/CoEPPGridTools-XX-XX-XX CoEPPGridTools
  3. Setup the package (you should do this everytime you start a new shell)

Skimming Example

  • Open a fresh shell
  • Setup athena (release is not too important, just need a version of root compiled against python) (if you havent done this before look here)
  • Setup your voms proxy, and the panda tools. (If you dont know how to do this, check 'Starting on the Grid', PandaRun and Panda Setup)
  • Go to the CoEPPGridTools directory

  • Source the setup, make a run directory and copy over a skimmer script
source setup.sh
mkdir run
cd run
cp ../share/skim_r17.py .
  • You can edit the configuration in the header of the skimmer
  • Launch the skimmer with the following command:
launchSkimmer -s Ztautau -t r17default -u <user name> --test 1 --postfix="TestSkim" skim_r17.py

This will launch the 'skim_r17.py' skimmer over the Ztautau sample for the default r17 production (configured in src/Samples.py). The --test 1 flag will send a single file job to the express stream, which should come back quickly (and will append 't1' to the dataset name). The -u should be replaced with your user name registered in the Panda Database. Finally --postfix will be added to the name of the dataset. A full set of instructions can be obtained using launchSkimmer --help. There are two main modes of configuration, either with the sample (-s) and tag (-t) flags (which load samples from the preconfigured database in Samples.py), or manually with --inDS and --outDS. Note that the first option will take care of a lot of the naming of the output dataset for you. To save yourself from specifying your username each time, you can set the default user name gUSER='wdavey' in src/BaseModule.py.

Checking the output

The output can be checked in a number of ways:
  1. Using pbook
    • just type pbook in the shell
  2. Check the Panda Monitor
    • Get your panda job id ( printed when you launch 'PandaID=XXXX' or found in pbook)
    • Go to the Panda Production Dashboard
    • Put your job ID into the 'Panda Job ID' field on the left tab.

Big Skimming Jobs

Organisation is key when performing a large scale skimming job. Typically the input data D3PDs are stored in separate datasets for each run. You will probably want to create your own containers for each period. The script share/createContainers.py is designed to sort the runs into periods and create new containers. It is very important that before running you set the configuration for your personal use correctly! You can find specific instructions regarding container creation here. You should also minimise the number of sites that you send your skimming jobs to. If you don't specify a site, you will spend weeks trying to track down every last file that got split over the grid. Luckily, the tau data D3PDs should be stored at one of four sites: DESY-HH_PERF-TAU, UNI-FREIBURG_PERF-TAU, MWT2_UC_PERF-TAU, TRIUMF-LCG2_PERF-TAU. You can check the location of a dataset using dq2-ls -r . Then use the --site flag to specify an appropriate site.

Command Line Tools

CoEPPGridTools has a set of command line tools in the CoEPPGridTools/scripts directory. You can get usage info on each tool with the --help flag.

mergeLumiTarballs - for calculating the lumi of skimmed data D3PDs

Skims produced with the CoEPPGridTools skimmers save the lumi info about each input D3PD in an xml file lumi_N.xml. All lumi xml files are put into a tarball at the end of the job (should be something like user.wdavey.002781._00163.lumi_XYZ.xml.tgz), and will be downloaded with the skimmed ntuples when you download the dataset. To calculate the lumi of your skimmed dataset, follow these instructions:
  1. Merge all the xml files for the dataset into one xml file. To do this execute the following command:
    •  mergeLumiTarballs -o <output xml file> <input tarballs> 
      where input tarballs is a list of input tarballs (and you can use standard command line wildcards like user.wdavey*lumi*.tgz)
  2. Take the overlap of this merged xml with your GRL using the default GoodRunsLists shipped with recent athena releases. To do this, make sure you have setup athena, then execute the following command:
    •  overlap_goodrunslists <GRL> <merged xml> 
  3. Calculate the lumi.
    • Go to the lumi wiki here
    • upload the overlapped xml
    • set your EF level trigger (needed incase its prescaled)
    • if you want to retrieve the distribution for reweighting specify the option --plots in the options field.

Note: If you now want to do this procedure for each period B-M separately in your analysis, there is a script: CoEPPNtupGen/share/genLumi.sh in CoEPPNtupGen to do this.

Ztautau Skimmer

The Ztautau skimmers are higher level skimmers that apply event selection on to of object selection. The idea is to only write out events that pass the standard Ztautau selection. Two versions of the Ztautau skimmer exist:
  1. skimZtautau_r17.py - writes out the standard set of branches
  2. eventCounterZtautau_r17.py writes out just the RunNumber and EventNumber branches.

eventCounterZtautau_r17.py is designed to be used to obtain event lists that can then be input to (eg.) athena jobs.

Two additional branches are added to the output tree:

  • TagIndex
  • ProbeIndex

Running Ztautau skimmers

The Ztautau skimmers must be run for the head directory of the package. From here execute these commands:
  1. copy over the desired skimmer:
    •  cp share/skimZtautau_r17.py . 
  2. run the skimmer:
    •  launchSkimmer --setup=setup.sh -s MuData-periodL -t r17p795 --test 1 --postfix="TestZtautauSkimmer" skimZtautau_r17.py  

Note: the location of the setup.sh script must be specified so that the python classes can be used on the grid.

-- WillDavey - 20-Nov-2011

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2012-01-15 - WillDavey
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback