MpiVHbbQuickStart
Registration
Max-Planck-Institut für Physik (MPP)
The registration is done locally. You will get a user account to login to the (centrally managed) desktop PCs.
Rechenzentrum Garching (RZG)
The RZG provides a web interface for registration:
https://www.rzg.mpg.de/secure/registrieren/antrag.php?inst=MPP&lang=en
Choose Dr. Stefan Kluth as institute representative and fill in your data, using
position |
guest |
mentor |
Hubert Kroha |
systems |
MPP Linux Cluster |
login name |
same as MPP user name |
shell |
bash |
Usually, the registration is processed and approved within 1-2 days.
ATLAS / lxplus
The ATLAS registration is available from
http://atlassec.web.cern.ch/atlassec/Registration.htm
Choose the form in the section "external participation". Provide the completed form to Anja Schielke, who will take care of the signature by Siegfried Bethke and sending it to the ATLAS secretary. As the registation process takes a few days, better start the process in advance.
Setup your working environments
Up-to-date instruction how to setup the most essential software on your local machine, the Rechenzentrum Garching (RZG) and the CERN computing cluster (lxplus) can be found here:
https://twiki.cern.ch/twiki/bin/view/Main/MpiQuickStart
As we need
RootCore
Getting started with ROOT
The main software utility used in high energy physics analysis is the
ROOT framework:
https://root.cern.ch/drupal/
https://root.cern.ch/drupal/content/howtos
A good tutorial about the very basics has been prepared by Mike:
http://fmueller.web.cern.ch/fmueller/ROOT/Flowerdew_ROOT.tgz
ROOT is also available for Python. A
PyROOT tutorial can be found here:
http://www.atlas.uni-wuppertal.de/~fleischm/lehre/ROOT2013/tutorial.html
The data format we are using is xAOD. It is the standard format for ATLAS Run II analyses. In contrast to the ntuples in the tutorials above, xAOD do not only contain flat branches, but also complex objects and the functions to access it. In order to use them, you need a special setup, which is only available on RZG and lxplus, but not at your local machines.
Working environment for CxAOD analysis
Login to RZG as described in the
MpiQuickStart tutorial. Then, create your working environment.
# setup the ATLAS environment
setupATLASUI
# prepare current version of RootCore
mkdir -p rc/2.0.26
cd rc/2.0.26
# setup RootCore
rcSetup Base,2.0.26
Now, you should have setup your environment to use xAODs.
When you login next time, you don't have to create the
RootCore base again:
setupATLASUI
cd rc/2.0.26
rcSetup
Location of CxAOD files
The signal samples are located in the ptmp directory. Our current
best datasets can be found here:
/ptmp/mpp/fmueller/grid/CxAOD/r229566_substr
/ptmp/mpp/fmueller/grid/CxAOD/r228889_substr/
The samples of interest are primarily the Higgs signal and the backgrounds from ttbar and W+jets.
DSID |
dataset |
location |
161805 |
VHbb signal at 13 TeV |
/ptmp/mpp/fmueller/grid/CxAOD/r229566_substr/user.fmueller.mc14_13TeV.161805.Pythia8_AU2CTEQ6L1_WH125_lnubb.CAOD_HIGG5D2.r229566_substr_outputLabel.root.16998665/ |
110401 |
ttbar (semi-leptonic) at 13 TeV |
/ptmp/mpp/fmueller/grid/CxAOD/r228889_substr/user.fmueller.mc14_13TeV.110401.PowhegPythia_P2012_ttbar_nonallhad.CAOD_HIGG5D2.r228889_substr_outputLabel.root.*/ |
167740 - 167745 |
W+jet inclusive at 13 TeV |
user.fmueller.mc14_13TeV.*.Sherpa_CT10_W*MassiveCBPt0_*.CAOD_HIGG5D2.r228889_substr_outputLabel.root.*/ |
Start with the signal sample and the ttbar sample. The W+jet consists of several separate channels, which have to mixed according to their cross-sections.
For convenience, place a symbolic link of the datasets you want to use in your home directory, e.g.
mkdir -p ~/data/CxAOD
cd ~/data/CxAOD
ln -s /ptmp/mpp/fmueller/grid/CxAOD/r229566_substr/user.fmueller.mc14_13TeV.161805.Pythia8_AU2CTEQ6L1_WH125_lnubb.CAOD_HIGG5D2.r229566_substr_outputLabel.root.16998665 mc14_13TeV.161805.Pythia8_AU2CTEQ6L1_WH125_lnubb.CAOD_HIGG5D2
Looking at xAODs in the TBrowser
If you want to examine
CxAOD in the TBrowser, setup
RootCore and start
ROOT:
setupATLASUI # on RZG worker node
cd rc/2.0.26 # your current RootCore work dir
rcSetup # root is included in RootCore
root -l
And then in CINT:
gROOT->Macro("$ROOTCOREDIR/scripts/load_packages.C")
xAOD::Init().ignore();
f = TFile::Open("<path-to-your-xAOD-file>")
t = xAOD::MakeTransientTree( f );
b = TBrowser()
Several warnings appear when making the transient tree. You can ignore those for the moment. When you get the TBrowser opened, all the variables appear as branches. There are also corresponding folders, but you don't need to worry about them. The naming policy we have adopted is to keep the original name of the branch, but then add e.g. "__Nominal". The "AuxDyn." part in the names is an xAOD thing. The variable name is at the end, e.g. "pt" or "eta".
Analysis code
The simplest approach to analyse
CxAOD is using
PyROOT. A basic example is given below.
To find out, which functions are available for the individual objects inside the
CxAOD, there are two approaches:
1) Code browser
The entire code of ATLAS can be accessed using
http://acode-browser.usatlas.bnl.gov/lxr/search. Unfortunately, quite some experience is needed to find the correct classes. You can try to search for "Jet_v1.h", "Electron_v1.h" etc. Most of the times, you should be lucky.
2) python interactively
When running python, you can stop the execution of your code by adding the line
ROOT.TPython.Prompt()
This gives you a python prompt, where you should be able to access the variables in the current scope. The command
dir()
should give you the list of objects. Using
help(<object_you_want_to_study>)
you can see the list of available functions, as in standard python.
Python code
Create a working directory on RZG:
mkdir -p ~/analysis/CxAOD
cd ~/analysis/CxAOD
Create a file "CxAODExample.py" and add the code given below.
#!/usr/bin/env python
import ROOT
import sys
from optparse import OptionParser
def getAux(auxObject, auxName, auxType = 'float'):
return auxObject.auxdataConst(auxType)(auxName)
def main(opts, args):
output = ROOT.TFile(opts.output, "RECREATE")
h_pdgId = ROOT.TH1D("h_pdgId", "ID according to Particle Data Booklet", 60, -30, 30)
# Set up RootCore and initialize the xAOD infrastructure
ROOT.gROOT.Macro( '$ROOTCOREDIR/scripts/load_packages.C' )
if(not ROOT.xAOD.Init().isSuccess()): print "Failed xAOD.Init()"
# processing each input file individually in order to avoid problems with MetaData
# for different TTree content
for filename in args:
print "Processing %s" % filename
tree = ROOT.xAOD.MakeTransientTree( ROOT.TFile(filename, "READ"), opts.treename)
nevents = tree.GetEntries()
print "TTree contains %i events." % nevents
if opts.nevents > 0: nevents = min(opts.nevents, nevents)
for i in xrange(nevents):
# simple progress status
if i % 100 == 0 : print "Processing event %i." % i
if i >= nevents: break
# event initialisation
tree.GetEntry(i)
# some examples how to access information from CxAOD
# 1) event information using the function provided by xAOD
mc_channel_number = tree.EventInfo___Nominal.mcChannelNumber()
# 2) looping over objects
for truth in tree.TruthParticle___Nominal:
pdgid = truth.pdgId()
# fill pdgid to hist
h_pdgId.Fill(pdgid)
# 3) direct access to aux variable (when no accessor available)
weight = getAux(tree.EventInfo___Nominal, "MCEventWeight", "float")
output.Write()
if __name__ == "__main__":
# parse command line input
parser = OptionParser("usage: %prog [options] file1 [file2 file3 ...]")
parser.add_option("-o", "--output", dest="output", default="output/histograms.root", help="Output file. Default=%default")
parser.add_option("", "--treename", dest="treename", default="CollectionTree", help="Tree name. Default=%default")
parser.add_option("-n", "--nevents", dest="nevents", type="int", default=-1, help="Number of events. Default=%default")
opts, args = parser.parse_args()
main(opts, args)
Now, set the file attribute
chmod +x CxAODExample.py
./CxAODExample.py --nevents 100 --output pdgid.root ~/data/CxAOD/mc14_13TeV.161805.Pythia8_AU2CTEQ6L1_WH125_lnubb.CAOD_HIGG5D2/user.fmueller.4774112._000001.outputLabel.root
If you are lucky, you should get an output file with a histogram showing the pdg IDs for 100 events.
Implementing the event selection
The event selection is given in the paper
https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/HIGG-2013-23/. Try to understand the individual object definitions and cuts from the paper.
Reconstruction level
The event selection is done on the reconstructed quantities. The following code snippets should help you to find the correct variables and the quality criteria (e.g. loose electrons, tight b-tag) you need to implement the event selection. Not all of them correspond 1-to-1 to what is stated in the paper. Check the differences!
Type |
Collection |
Comment |
Decorator |
Electrons |
ElectronCollection___Nominal |
loose |
isVeryLooseLH && pt > 7 GeV && abs(eta) < 2.47 && isGoodOQ |
isVHLooseElectron |
|
|
signal |
isVHLooseElectron && isVeryTightLH && pt > 25 GeV |
isWHSignalElectron |
Muons |
Muons___Nominal |
loose |
(muonType == Combined || muonType == SegmentTagged) && d0 < 0.1 && z0 < 10 && abs(eta) < 2.7 && pt > 7 && trackIso (ptcone20) && acceptedMuonTool |
isVHLooseMuon |
|
|
signal |
isVHLooseMuon && trackIso (ptcone20) && caloIso(etcone30) && abs(eta) < 2.5 && pt > 25 GeV |
isWHSignalMuon |
Jets |
AntiKt4LCTopoJets___Nominal |
signal |
!isVetoJet && abs(eta) < 2.5 && pt > 20 GeV |
isSignalJet |
|
|
loose |
80% efficiency point |
SV1_IP3D > -0.85 && signal |
|
|
medium |
70% efficiency point |
SV1_IP3D > 1.55 && signal |
|
|
tight |
50% efficiency point |
SV1_IP3D > 7.60 && signal |
MET |
MET_RefFinal___Nominal |
|
|
e.g. MET_RefFinal___Nominal.at(0).mpy() |
Event selection
Implement the event selection in the following order and keep track of the so-called cut-flow using a dedicated histogram. The histogram should contain the number of events after each step of the event selection.
- Selection of channel
- 1-lep selection
- 2-jet category
- 2 loose b-tags (LL)
- Event selection
- common selection (pt of vector boson, dR of dijet system)
- Transverse mass of W (wmt)
- Summed energy (HT)
- Missing transverse energy (MET)
Making your code ready for batch submission
As you probably want to run over large data sets in the long term plan, we need a way to submit your jobs. First, we setup the submission script that allows you to submit the job on the RZG compuing cluster. Then we implement a simple, custom made configuration manager.
Preparing the submission script
The submission to the rzg batch cluster is handles with a customized submission script which does most of the work for you:
- Making a tarball of your submission area (i.e. your program, config files etc.)
- Taking care of collecting all input files (as specified in a config file)
- Placing the output and log files in a given directory
First, get the script:
mkdir -p ~/analysis/util
cp /afs/ipp-garching.mpg.de/home/f/fmueller/svn/util/submit/* ~/analysis/util
Add the script to your path environment variable within your ~/.bashrc, in order to make it accessible from everywhere:
export PATH=$PATH:~/analysis/util
. After the login, you can try
submit_tarball.py --help
If that worked, you have to submit your password to the batch cluster once, so the batch cluster can access your home folder. Simply use the command below and type in your password when asked:
save-password
You just have to do that once.
Now copy the exclude.txt to your
CxAOD folder. If you are in your
CxAOD folder, use:
cp ../util/exclude.txt .
Modifications of your code
At the top of your code, add the
ConfigManager (see below).
from ConfigManager import ConfigManager
When calling the main function, replace
main(opts, args)
by
# run output of option parser through config manager
cfg = ConfigManager(opts, args)
cfg.print_config()
print cfg.args
main(cfg.opts, cfg.args)
Create a file "ConfigManager.py" and place it in your python directory.
#!/usr/bin/env python
from glob import glob
class ConfigManager(object):
def __init__(self, opts, args):
self.args = args
self.opts = opts
self.data = {}
if len(args) == 1 and not isroot(args[0]):
self.data = parse(args[0])
for key in self.data:
if opts.ensure_value(key, self.data[key]) != self.data[key]:
setattr(opts, key, self.data[key])
# dedicated functionality here
if "InFiles" in self.data: # special string to replace args with config file input
self.args = []
for f in self.data["InFiles"].split(","):
self.args += glob(f)
if "OutFile" in self.data: # special string to replace opts.output with config file output name
self.opts.output = self.data["OutFile"]
#def __str__(self):
#return "channel = %s\tprocess = %s\txsec = %4.2f [pb]" % (self.chan, self.proc, self.xsec)
def print_config(self):
print "args: ", self.args
print "opts:"
for key in vars(self.opts): print "%20s:\t%s" % (key, getattr(self.opts, key))
def parse(filename):
data = {}
for l in file(filename):
r = parseline(l)
if r:
key, val = r
if isinstance(val, int): data[key] = int(val)
if isinstance(val, float): data[key] = float(val)
if isinstance(val, str): data[key] = val.strip("\"\'")
return data
def parseline(line):
s = line.replace(" ", "").replace("\n", "") # cleanup line
if "#" in s: s = s[:s.find("#")] # remove comments
if len(s) == 0: return None # check if comment
assert s.count("=") != 1 or s[-1] != "!", "Syntax error in line \"%s\"" % s # check syntax
return s.rstrip(";").split("=") # key/value pair
def isroot(filename):
return file(filename).readline()[0:4] == "root"
|
This should be completely transparent for the standard usage of your program.
Now, you can steer your program using config files. Create a direcory
mkdir cfg
and place a config file in there:
# 1-lep VHbb test file @ 8 TeV
InFiles="/ptmp/mpp/fmueller/grid/CxAOD/r655063_sub2/user.fmueller.mc14_8TeV.189421.PowhegPythia8_AU2CT10_WpH125J_MINLO_munubb_VpT_Weighted.CAOD_HIGG5D2.r655063_sub2_outputLabel.root.22649133/user.fmueller.5173297._000001.outputLabel.root";
OutFile="output/test.root";
|
Make sure you specify the full path (using
pwd
), so that the batch submission does not fail. For the beginning, one input file is enough; later, you can give several files using asterisk (*) and/or as comma separated list. Each sample, however, must have an individual output file. Hence, create one config for each sample you want to run over.
The configuration can be used either locally, or on the batch submission.
Locally:
python/CxAODExample.py cfg/test.cfg
Batch submission:
cd ~/analysis/CxAOD
mkdir output # only necessary the first time
submit_tarball.py --exec python/CxAODExample.py --name CxAOD.150327 --output output/ --queue short --nJobs 10 --copy cfg/test.cfg
option |
comment |
--exec |
your executable |
--output |
output directory |
--name |
name of your job; the submission script will create a subdirectory with this name |
--queue |
"short" should be sufficient for your needs |
--nJobs |
depends on how many jobs you want to place in parallel |
--copy |
flag to copy the input data to the node |
cfg/test.cfg |
configuration file with the input files and output file |
C++ code
The analysis using C++ would be analogous to the python Code. A possible example is given with the
CxAODReader:
to be added
A more general tutorial for xAOD analysis using
RootCore in C++ is given here:
https://twiki.cern.ch/twiki/bin/view/AtlasComputing/SoftwareTutorialxAODAnalysisInROOT
However, the analysis of
CxAOD is much much simpler, as all complicated steps such as calibration and systematic variations are already taken in the predecessing steps.
--
FelixMueller - 2015-03-01