MpiVHbbQuickStart

Registration

Max-Planck-Institut für Physik (MPP)

The registration is done locally. You will get a user account to login to the (centrally managed) desktop PCs.

Rechenzentrum Garching (RZG)

The RZG provides a web interface for registration:

https://www.rzg.mpg.de/secure/registrieren/antrag.php?inst=MPP&lang=en

Choose Dr. Stefan Kluth as institute representative and fill in your data, using

position guest
mentor Hubert Kroha
systems MPP Linux Cluster
login name same as MPP user name
shell bash

Usually, the registration is processed and approved within 1-2 days.

ATLAS / lxplus

The ATLAS registration is available from http://atlassec.web.cern.ch/atlassec/Registration.htm

Choose the form in the section "external participation". Provide the completed form to Anja Schielke, who will take care of the signature by Siegfried Bethke and sending it to the ATLAS secretary. As the registation process takes a few days, better start the process in advance.

Setup your working environments

Up-to-date instruction how to setup the most essential software on your local machine, the Rechenzentrum Garching (RZG) and the CERN computing cluster (lxplus) can be found here: https://twiki.cern.ch/twiki/bin/view/Main/MpiQuickStart

As we need RootCore

Getting started with ROOT

The main software utility used in high energy physics analysis is the ROOT framework: https://root.cern.ch/drupal/ https://root.cern.ch/drupal/content/howtos

A good tutorial about the very basics has been prepared by Mike: http://fmueller.web.cern.ch/fmueller/ROOT/Flowerdew_ROOT.tgz

ROOT is also available for Python. A PyROOT tutorial can be found here: http://www.atlas.uni-wuppertal.de/~fleischm/lehre/ROOT2013/tutorial.html

VHbb CxAOD

The data format we are using is xAOD. It is the standard format for ATLAS Run II analyses. In contrast to the ntuples in the tutorials above, xAOD do not only contain flat branches, but also complex objects and the functions to access it. In order to use them, you need a special setup, which is only available on RZG and lxplus, but not at your local machines.

Working environment for CxAOD analysis

Login to RZG as described in the MpiQuickStart tutorial. Then, create your working environment.

# setup the ATLAS environment
setupATLASUI  

# prepare current version of RootCore
mkdir -p rc/2.0.26    
cd rc/2.0.26

# setup RootCore
rcSetup Base,2.0.26

Now, you should have setup your environment to use xAODs. When you login next time, you don't have to create the RootCore base again:

setupATLASUI  
cd rc/2.0.26
rcSetup

Location of CxAOD files

The signal samples are located in the ptmp directory. Our current best datasets can be found here:

/ptmp/mpp/fmueller/grid/CxAOD/r229566_substr
/ptmp/mpp/fmueller/grid/CxAOD/r228889_substr/

The samples of interest are primarily the Higgs signal and the backgrounds from ttbar and W+jets.

DSID dataset location
161805 VHbb signal at 13 TeV /ptmp/mpp/fmueller/grid/CxAOD/r229566_substr/user.fmueller.mc14_13TeV.161805.Pythia8_AU2CTEQ6L1_WH125_lnubb.CAOD_HIGG5D2.r229566_substr_outputLabel.root.16998665/
110401 ttbar (semi-leptonic) at 13 TeV /ptmp/mpp/fmueller/grid/CxAOD/r228889_substr/user.fmueller.mc14_13TeV.110401.PowhegPythia_P2012_ttbar_nonallhad.CAOD_HIGG5D2.r228889_substr_outputLabel.root.*/
167740 - 167745 W+jet inclusive at 13 TeV user.fmueller.mc14_13TeV.*.Sherpa_CT10_W*MassiveCBPt0_*.CAOD_HIGG5D2.r228889_substr_outputLabel.root.*/

Start with the signal sample and the ttbar sample. The W+jet consists of several separate channels, which have to mixed according to their cross-sections. For convenience, place a symbolic link of the datasets you want to use in your home directory, e.g.

mkdir -p ~/data/CxAOD
cd ~/data/CxAOD
ln -s /ptmp/mpp/fmueller/grid/CxAOD/r229566_substr/user.fmueller.mc14_13TeV.161805.Pythia8_AU2CTEQ6L1_WH125_lnubb.CAOD_HIGG5D2.r229566_substr_outputLabel.root.16998665 mc14_13TeV.161805.Pythia8_AU2CTEQ6L1_WH125_lnubb.CAOD_HIGG5D2

Looking at xAODs in the TBrowser

If you want to examine CxAOD in the TBrowser, setup RootCore and start ROOT:

setupATLASUI     # on RZG worker node
cd rc/2.0.26     # your current RootCore work dir
rcSetup    # root is included in RootCore
root -l

And then in CINT:

gROOT->Macro("$ROOTCOREDIR/scripts/load_packages.C")
xAOD::Init().ignore();
f = TFile::Open("<path-to-your-xAOD-file>")
t = xAOD::MakeTransientTree( f );
b = TBrowser() 

Several warnings appear when making the transient tree. You can ignore those for the moment. When you get the TBrowser opened, all the variables appear as branches. There are also corresponding folders, but you don't need to worry about them. The naming policy we have adopted is to keep the original name of the branch, but then add e.g. "__Nominal". The "AuxDyn." part in the names is an xAOD thing. The variable name is at the end, e.g. "pt" or "eta".

Analysis code

The simplest approach to analyse CxAOD is using PyROOT. A basic example is given below. To find out, which functions are available for the individual objects inside the CxAOD, there are two approaches:

1) Code browser

The entire code of ATLAS can be accessed using http://acode-browser.usatlas.bnl.gov/lxr/search. Unfortunately, quite some experience is needed to find the correct classes. You can try to search for "Jet_v1.h", "Electron_v1.h" etc. Most of the times, you should be lucky.

2) python interactively

When running python, you can stop the execution of your code by adding the line

ROOT.TPython.Prompt()
This gives you a python prompt, where you should be able to access the variables in the current scope. The command
dir()
should give you the list of objects. Using
help(<object_you_want_to_study>)
you can see the list of available functions, as in standard python.

Python code

Create a working directory on RZG:

mkdir -p ~/analysis/CxAOD
cd ~/analysis/CxAOD

Create a file "CxAODExample.py" and add the code given below.

#!/usr/bin/env python
import ROOT
import sys
from optparse import OptionParser

def getAux(auxObject, auxName, auxType = 'float'):
    return auxObject.auxdataConst(auxType)(auxName)

def main(opts, args):

    output = ROOT.TFile(opts.output, "RECREATE")
    h_pdgId = ROOT.TH1D("h_pdgId", "ID according to Particle Data Booklet", 60, -30, 30)

    # Set up RootCore and initialize the xAOD infrastructure
    ROOT.gROOT.Macro( '$ROOTCOREDIR/scripts/load_packages.C' )
    if(not ROOT.xAOD.Init().isSuccess()): print "Failed xAOD.Init()"

    # processing each input file individually in order to avoid problems with MetaData
    # for different TTree content
    for filename in args:
        print "Processing %s" % filename
        tree = ROOT.xAOD.MakeTransientTree( ROOT.TFile(filename, "READ"), opts.treename)
        nevents = tree.GetEntries()

        print "TTree contains %i events." % nevents
        if opts.nevents > 0: nevents = min(opts.nevents, nevents)
        
        for i in xrange(nevents):

            # simple progress status
            if i % 100 == 0 : print "Processing event %i." % i
            if i >= nevents: break

            # event initialisation
            tree.GetEntry(i)

            # some examples how to access information from CxAOD

            # 1) event information using the function provided by xAOD
            mc_channel_number = tree.EventInfo___Nominal.mcChannelNumber()

            # 2) looping over objects
            for truth in tree.TruthParticle___Nominal:
                 pdgid = truth.pdgId()

                 # fill pdgid to hist
                 h_pdgId.Fill(pdgid)

            # 3) direct access to aux variable (when no accessor available)
            weight = getAux(tree.EventInfo___Nominal, "MCEventWeight", "float")

    output.Write()

if __name__ == "__main__":

    # parse command line input
    parser = OptionParser("usage: %prog [options] file1 [file2 file3 ...]")
    parser.add_option("-o", "--output",    dest="output",              default="output/histograms.root", help="Output file. Default=%default")
    parser.add_option("",   "--treename",  dest="treename",            default="CollectionTree",         help="Tree name. Default=%default")
    parser.add_option("-n", "--nevents",   dest="nevents", type="int", default=-1,                       help="Number of events. Default=%default")
    opts, args = parser.parse_args()

    main(opts, args)

Now, set the file attribute

chmod +x CxAODExample.py
./CxAODExample.py --nevents 100 --output pdgid.root ~/data/CxAOD/mc14_13TeV.161805.Pythia8_AU2CTEQ6L1_WH125_lnubb.CAOD_HIGG5D2/user.fmueller.4774112._000001.outputLabel.root

If you are lucky, you should get an output file with a histogram showing the pdg IDs for 100 events.

Implementing the event selection

The event selection is given in the paper https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/HIGG-2013-23/. Try to understand the individual object definitions and cuts from the paper.

Reconstruction level

The event selection is done on the reconstructed quantities. The following code snippets should help you to find the correct variables and the quality criteria (e.g. loose electrons, tight b-tag) you need to implement the event selection. Not all of them correspond 1-to-1 to what is stated in the paper. Check the differences!

Type Collection Comment Decorator
Electrons ElectronCollection___Nominal loose
isVeryLooseLH && pt > 7 GeV && abs(eta) < 2.47 && isGoodOQ
isVHLooseElectron
    signal
isVHLooseElectron && isVeryTightLH && pt > 25 GeV
isWHSignalElectron
Muons Muons___Nominal loose
(muonType == Combined || muonType == SegmentTagged) && d0 < 0.1 && z0 < 10 && abs(eta) < 2.7 && pt > 7 && trackIso (ptcone20) && acceptedMuonTool
isVHLooseMuon
    signal isVHLooseMuon && trackIso (ptcone20) && caloIso(etcone30) && abs(eta) < 2.5 && pt > 25 GeV isWHSignalMuon
Jets AntiKt4LCTopoJets___Nominal signal !isVetoJet && abs(eta) < 2.5 && pt > 20 GeV isSignalJet
    loose 80% efficiency point SV1_IP3D > -0.85 && signal
    medium 70% efficiency point SV1_IP3D > 1.55 && signal
    tight 50% efficiency point SV1_IP3D > 7.60 && signal
MET MET_RefFinal___Nominal     e.g. MET_RefFinal___Nominal.at(0).mpy()

Event selection

Implement the event selection in the following order and keep track of the so-called cut-flow using a dedicated histogram. The histogram should contain the number of events after each step of the event selection.

  • Selection of channel
    • 1-lep selection
    • 2-jet category
    • 2 loose b-tags (LL)
  • Event selection
    • common selection (pt of vector boson, dR of dijet system)
    • Transverse mass of W (wmt)
    • Summed energy (HT)
    • Missing transverse energy (MET)

Making your code ready for batch submission

As you probably want to run over large data sets in the long term plan, we need a way to submit your jobs. First, we setup the submission script that allows you to submit the job on the RZG compuing cluster. Then we implement a simple, custom made configuration manager.

Preparing the submission script

The submission to the rzg batch cluster is handles with a customized submission script which does most of the work for you:

  • Making a tarball of your submission area (i.e. your program, config files etc.)
  • Taking care of collecting all input files (as specified in a config file)
  • Placing the output and log files in a given directory

First, get the script:

mkdir -p ~/analysis/util
cp /afs/ipp-garching.mpg.de/home/f/fmueller/svn/util/submit/* ~/analysis/util
Add the script to your path environment variable within your ~/.bashrc, in order to make it accessible from everywhere:
export PATH=$PATH:~/analysis/util
. After the login, you can try

submit_tarball.py --help

If that worked, you have to submit your password to the batch cluster once, so the batch cluster can access your home folder. Simply use the command below and type in your password when asked:

save-password

You just have to do that once.

Now copy the exclude.txt to your CxAOD folder. If you are in your CxAOD folder, use:

cp ../util/exclude.txt .

Modifications of your code

At the top of your code, add the ConfigManager (see below).
from ConfigManager import ConfigManager

When calling the main function, replace

    main(opts, args)
by
    # run output of option parser through config manager
    cfg = ConfigManager(opts, args)
    cfg.print_config()

    print cfg.args
    
    main(cfg.opts, cfg.args)

Create a file "ConfigManager.py" and place it in your python directory.

#!/usr/bin/env python
from glob import glob

class ConfigManager(object):
    def __init__(self, opts, args):
        self.args = args
        self.opts = opts
        self.data = {}

        if len(args) == 1 and not isroot(args[0]):
            self.data = parse(args[0])

        for key in self.data:
            if opts.ensure_value(key, self.data[key]) != self.data[key]:
                setattr(opts, key, self.data[key])

        # dedicated functionality here
        if "InFiles" in self.data: # special string to replace args with config file input
            self.args = []
            for f in self.data["InFiles"].split(","):
                self.args += glob(f)
            
        if "OutFile" in self.data: # special string to replace opts.output with config file output name
            self.opts.output = self.data["OutFile"]

    #def __str__(self):
        #return "channel = %s\tprocess = %s\txsec = %4.2f [pb]" % (self.chan, self.proc, self.xsec)

    def print_config(self):
        print "args: ", self.args
        print "opts:"
        for key in vars(self.opts): print "%20s:\t%s" % (key, getattr(self.opts, key))

def parse(filename):
    data = {}
    for l in file(filename):
        r = parseline(l)
        if r:
            key, val = r
            if isinstance(val, int): data[key] = int(val)
            if isinstance(val, float): data[key] = float(val)
            if isinstance(val, str): data[key] = val.strip("\"\'")
    return data

def parseline(line):
    s = line.replace(" ", "").replace("\n", "") # cleanup line
    if "#" in s: s = s[:s.find("#")] # remove comments
    if len(s) == 0: return None # check if comment

    assert s.count("=") != 1 or s[-1] != "!", "Syntax error in line \"%s\"" % s # check syntax
    return s.rstrip(";").split("=") # key/value pair

def isroot(filename):
    return file(filename).readline()[0:4] == "root"

This should be completely transparent for the standard usage of your program. Now, you can steer your program using config files. Create a direcory

mkdir cfg
and place a config file in there:

# 1-lep VHbb test file @ 8 TeV
InFiles="/ptmp/mpp/fmueller/grid/CxAOD/r655063_sub2/user.fmueller.mc14_8TeV.189421.PowhegPythia8_AU2CT10_WpH125J_MINLO_munubb_VpT_Weighted.CAOD_HIGG5D2.r655063_sub2_outputLabel.root.22649133/user.fmueller.5173297._000001.outputLabel.root";
OutFile="output/test.root";
Make sure you specify the full path (using
pwd
), so that the batch submission does not fail. For the beginning, one input file is enough; later, you can give several files using asterisk (*) and/or as comma separated list. Each sample, however, must have an individual output file. Hence, create one config for each sample you want to run over.

The configuration can be used either locally, or on the batch submission.

Locally:

python/CxAODExample.py cfg/test.cfg

Batch submission:

cd ~/analysis/CxAOD
mkdir output # only necessary the first time
submit_tarball.py --exec python/CxAODExample.py --name CxAOD.150327 --output output/ --queue short --nJobs 10 --copy cfg/test.cfg

option comment
--exec
your executable
--output
output directory
--name
name of your job; the submission script will create a subdirectory with this name
--queue
"short" should be sufficient for your needs
--nJobs
depends on how many jobs you want to place in parallel
--copy
flag to copy the input data to the node
cfg/test.cfg
configuration file with the input files and output file

C++ code

The analysis using C++ would be analogous to the python Code. A possible example is given with the CxAODReader:

to be added

A more general tutorial for xAOD analysis using RootCore in C++ is given here:

https://twiki.cern.ch/twiki/bin/view/AtlasComputing/SoftwareTutorialxAODAnalysisInROOT

However, the analysis of CxAOD is much much simpler, as all complicated steps such as calibration and systematic variations are already taken in the predecessing steps.

-- FelixMueller - 2015-03-01

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2015-07-13 - FelixMueller
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback