EPF data analysis - a tutorial and link base

The following is a brief but linkful tutorial on how to get from ATLAS AOD files on the grid to a real physics analysis on your desktop. It covers a lot of ground, so be warned... The goal is to

  • create easy-to-use ntuples from AOD files using the package HighPtView and the gird submission tool GANGA
  • create your own analysis tool for reading these ntuples
  • make a simple test analysis

Based on this, you should be able to go on to building a more detailed analysis - e.g. suitable for a Master thesis smile

This document will be brief and to the point, but there will be many links to further information for you to study off-line.

Let's go, and I apolgoise for all the spleling misateks.

0 - Motto

Problems are unavoidable. They are also irrelevant, if you know how to solve them.

1 - Set up ATLAS software release 12.0.7.1

You may well already have this set up. If so, skip to part 2.

Brief setup instructions, from scratch:

  1. Make an analysis directory (or a cmthome directory, if you prefer). In my case: /home/scratch/bjornhs/analysis
  2. Make a directory under the analysis dir for working with 12.0.7 files. In my case: /home/scratch/bjornhs/analysis/12.0.7
  3. Create a requirements file. My file (/home/scratch/bjornhs/analysis/requirements): requirements
  4. Make sure you change the references in the file to ones matching the directories you made wink
  5. Source the ATLAS environment setup:
     source /shared/SOFTWARE/runtime/APPS/HEP/ATLAS-12.0.7.1 
  6. If you changed the requirements file, go to where it is and say
     cmt config 
  7. Source the setup file made by cmt config with a few options:
     source setup.sh -tag=12.0.7,gcc323 

That's it - you should now have athena & Co. set up, and be able to run HighPtView, SUSYView and TopView.

2 - Make a test ntuple using HighPtView and a local file

The next step is to push on and make an ntuple. If you want info on what HighPtView or an ntuple is, read Appendix A below.

2.1 Run HighPtView

Links:

HightPtView (and SUSYView and TopView) is included with athena, so there's no need to check anything out. The only thing we need is to provide an input file. Here we go:

  1. cd into your analysis directory (this next step can actually be done from antwhere...)
  2. Make a file called MyInput.py and write the following in it:
     EventSelector.InputCollections =["/mn/guts/data/atlas/katarzp/data/trig1_misal1_csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000502/trig1_misal1_csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000502_tid005443._00001.pool.root"] 
  3. Run HighPtView on 100 events from this file like this:
      athena.py -c "theApp.EvtMax=100" HighPtView/HighPtViewNtuple_topOptions.py MyInput.py 
  4. Sit back and watch two million lines of text scroll by.
  5. This should produce four files: MuidTau1p3p_HighPtAANtuple.root, MuidTauRec_HighPtAANtuple.root, StacoTau1p3p_HighPtAANtuple.root, StacoTauRec_HighPtAANtuple.root. If so, and the output of the job ends with 'INFO Application Manager Terminated successfully', then all is OK.

Don't start looking at the files yet - we'll do that once we have a grid job to wait for. (Or if you can't wait, look at section 6 below wink

2.2 Check out and compile the latest HighPtView

To send proper grid jobs we need to be a bit more fancy, and check out the HighPtView package to work with. Instructions:

  1. First make sure you have the right CVS variables:
     export CVS_RSH=ssh
    export CVSROOT=:kserver:atlas-sw.cern.ch:/atlascvs
    export CMTCVSOFFSET=offline
       
  2. Get a CVS token like this:
     klog.krb -principal CernUserName 
  3. Go to your analysis/12.0.7 directory (the one which is named under ATLAS_TEST_AREA in the requirements file)
  4. Say
     
    cmt co -r HighPtView-00-00-46 PhysicsAnalysis/HighPtPhys/HighPtView
    cmt co -r EventViewConfiguration-00-00-59 PhysicsAnalysis/EventViewBuilder/EventViewConfiguration 
    
  5. Compile the packages like this:
     
    cd PhysicsAnalysis/HighPtPhys/HighPtView/cmt
    gmake
    cd -
    cd PhysicsAnalysis/EventViewBuilder/EventViewConfiguration/cmt
    gmake
    cd -
    
  6. Set up for running:
     cd PhysicsAnalysis/HighPtPhys/HighPtView/cmt
    cmt config
    source setup.sh
    cd ../run/
    get_files HighPtView/HighPtViewNtuple_topOptions.py
    
  7. Fix a small bug in the present version - edit HighPtViewNtuple_topOptions.py and comment out these lines (i.e. put a # in front of them):
    #doTauRecEtCorrection=True
    #include("TauAlgs/TauRecEtCorrection_jobOptions.py")
    

2.3 Run HighPtView with a config file

Many properties of HighPtView can be set either from the command line or from a simple setup file. Let's make one - create a file called HPTV_prod.py and fill it like this:


# Config file for HighPtView
# Instructions: https://twiki.cern.ch/twiki/bin/view/Atlas/HighPtViewInstructions

# Here are all the setters:
Mode=["FullReco", "Truth", "Atlfast"] # You can add "Performance"
DetailLevel=["BasicAOD"] # Or "FullStandardAOD", "Kinematics", ...
Branches= ["MuidTau1p3p","MuidTauRec","StacoTau1p3p","StacoTauRec"] 
KinematicsOnly=False # This overrides DetailLevel setting it to Kinematics
NtupleName="HighPtAANtuple.root"
DoScreenDump=False
ShowUserData=False
DoNtupleDump=True
OutputLevel=3
DoTrigger=True
OutputDirectory="."
SingleHPTVFile=False # Write all branches into signle file
CorrectAOD=False  # Apply 1mm/30micron corrections
ExtraMETKeys=[] # Keys of additional MET calculation available on input
TriggerView=False # Create addition Trigger ntuple with all Trigger objects
StoreGateDump=False 
#InserterConfiguration=None # Mechanism for reconfiguring Inserters. See instructions.
#TruthInserter=HighPtTruthInserter   # You may supply your own truth inserter module here
#RecoInserter=HighPtSingleInserters  # You may supply your own reco inserter module here
#AnalysisModules={}                          # You may supply your own analysis modules here 

SaveHLTMET=True # Our test AOD samples did not have this info... and some other samples have problems. 

# Want to set a max number of events? -1 means all events.
theApp.EvtMax=-1

# Include the main HighPtView options:
include('HighPtViewNtuple_topOptions.py')

Edit this file, change it so that

Branches= ["MuidTau1p3p"]
DetailLevel=["FullStandardAOD"]

and run HighPtView again using the config file:

athena HPTV_prod.py MyInput.py

This should produce a single file called MuidTau1p3p_HighPtAANtuple.root.

3 - Download and install GANGA and ARC

OK, you can produce ntuples. Great. We'll use them for something in section 6 below. But all the data files you want to make ntuples from are on the grid, and it's not good practice to copy all of them to Oslo. This means you have to learn to send ntuple making jobs to the grid. Let's do so. We will

  • use GANGA, a python tool, to configure HighPtView jobs for the grid
  • use Adrians nice AOD browser to find datasets (section 4)
  • submit a bigger job, and then look at the ntuples you made above while it runs

3.1 Download GANGA

To download and install GANGA and ARC 0.6, do the following:

(For the regular, updated GANGA installation instructions, see http://ganga.web.cern.ch/ganga/download/)

  1. Make a drectory where you want to install ganga (/home/scratch/bjornhs/ganga/4.4.1 in my case)
  2. Get the script ganga-install from the GANGA webpage.
  3. Make it executable: > chmod u+x ./ganga-install
  4. Run the following command:
    ./ganga-install --prefix=/where/you/want/it/installed --extern=GangaAtlas,GangaNG 4.4.1
    

That's it - this gives you both GANGA with the NorduGrid backend and the nordugrid-arc-standalone middleware. You can also add 'GangaGUI' to the --extern list if you want a GUI interface to GANGA, but we will not use this for todays exercise.

3.2 Starting and setting up GANGA

Three more things are needed to do to get GANGA working properly:

  • On first startup, you will be prompted to have GANGA auto-create a .gangarc file. Let the nice program do this.
  • Edit the file ${HOME}/.gangarc, and under [Configuration] set
RUNTIME_PATH = GangaAtlas:GangaNG
  • A bit further down, set the local repository option to point to your scratch disk (this is where your grid files will go):
    local_root = /home/scratch/USERNAME/gangadir/repository
    
  • ...and the same for the workdir:
    topdir = /home/scratch/USERNAME/gangadir/workspace/
    
  • To set up ARC, go to /where/you/installed/ganga/external/nordugrid-arc-standalone/0.6.0/slc3_gcc323/ and say
source ./setup.sh
grid-update-crls

Now restart ganga, and all should we well. Currently source ./setup.sh has to be run first in every shell where you want to start GANGA. This is a bug and will be fixed...

3.3 A GANGA mini-tutorial

When you start GANGA you will be asked for your grid certificate password. After this, what you get is a python prompt where you can write regular python code if you want.

Let's send a test job, just to see if everything works. Type the following into GANGA:


j = Job()
j.application=Executable()
j.application.exe='echo'
j.application.args='Dear grid user. I, your humble servant node, have dutifully executed your job. You may now proceed to download my output. Regards, ${HOSTNAME}'
j.backend=NG()

The first command created a job object, the rest set the properties of the job object. In section 5 below, we will create a more advanced job object that can send athena jobs. Now try:

  1. Say jobs in GANGA. This lists your jobs, and the test job we just made is the last one on the list.
  2. Say j to print out the contents of the job object. See the properties we set above?
  3. Say j.submit() and watch the job get submitted to NorduGrid. Hopefully.
  4. Say j.backend to print the backend info for the job, and find the jobID. It looks somewhat like this: gsiftp://grid.titan.uio.no:2811/jobs/4641176455549910676906. In this case, the job went to grid.titan.uio.no.
  5. If you want, go to the nordugrid atlas monitor page, find the cluster your job went to (you need to hover the mouse over a grid name to find the right one, but you'll soon learn them by heart...), and check either the queue or the running jobs to check on your job.

Note that because of the changes we made in .gangarc, your job output will end up under /home/scratch/USERNAME/gangadir/workspace/Local/JOBNUMBER/.

When the job goes into the state COMPLETED (or, if you are unlucky, FAILED), it is done on the grid and the output (or the error file...) has been downloaded to your computer. You can check the output in one of two ways:

  • From within ganga, say j.peek() to list the output directory, and then e.g. j.peek('stdout.txt','emacs') to open the standard output in emacs. To pre-empt a bit, you can also do j.peek('MuidTau1p3p_HighPtAANtuple.root','root') to open an ntuple directly in root from within ganga smile
  • Exit ganga (or go to another shell window...) and cd to /home/scratch/USERNAME/gangadir/workspace/Local/JOBNUMBER/output to look at the files directly.

Have a look at the output, and then we'll go on to some real analysis.

4 - Find data sets for your analysis

Now the fun begins. You need to find some datasets to run on that suit your analysis - for the moment, this means ATLAS full simulation datasets, available on NorduGrid. All files you can easily get at are listed on this nice page made by Adrian:

The naming convention for an AOD set is like this:

triggerset_geometry_productiontype.inputfilenumber.physicsprocess.filetype.AOD.athenaversion_taskid

The most relevant things are:

  • physicsprocess - what goes on in the file. This we have to discuss separately...
  • athenaversion - you want 12.0.6.31 (written 12000631) or later
  • filetype - you want reco or merge. NOT fastsim.

You will work a lot with this later, but for now I have some suggestions. Select one signal and one background sample for this tutorial (and see if you find them in Adrians list).

4.1 Signal suggestions

For SUSY: trig1_misal1_csc11.005403.SU3_jimmy_susy.merge.AOD.v12000605

This is an mSUGRA SUSY signal set, used as 'truth' for many of the SUSY CSC notes.

For Higgs: trig1_misal1_mc12.005850.WH120bb_pythia.recon.AOD.v12000604

A 120GeV Higgs decaying to bbar, together with a W decaying to a lepton and a neutrino. Final state with two b-jets, a lepton and some missing Et.

4.2 Bakcground

For both: trig0_calib0_mc12.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000603

Ttbar production, where the decay is required to not be all-hadronic. I.e. there are some leptons in the final state.

5 - Send grid jobs for these datasets

Now we will set up some real grid jobs, using the modified HighPtView we worked with above, and send them to the grid.

5.1 Athena job object

Copy the following to a file called ganga_hptv_job.py :


# Make the job object, name it
j2 = Job()
j2.name = 'NG07_ex2'

# Set the input data set, requesting only one file
j2.inputdata = DQ2Dataset()
j2.inputdata.dataset = 'trig1_misal1_csc11.005403.SU3_jimmy_susy.merge.AOD.v12000605'
j2.inputdata.number_of_files=5

# Set the application and some properties
j2.application = Athena()
j2.application.max_events = 100
j2.application.option_file = '/YOUR/ANALYSIS/PATH/12.0.7/PhysicsAnalysis/HighPtPhys/HighPtView/run/HPTV_prod.py'
j2.application.prepare(athena_compile=True)
j2.application.atlas_release = '12.0.7.1'

# Add another input file explicitly, since we include this file in HPTV_prod.py
j2.inputsandbox=['HighPtViewNtuple_topOptions.py']

# Split the job into subjobs with a more manageable number of input files?
# 10 AOD files per job is usually OK
j2.splitter=AthenaSplitterJob()
j2.splitter.numsubjobs=5

# Output data
j2.outputdata = ATLASOutputDataset()
j2.outputdata.outputdata = ['MuidTau1p3p_HighPtAANtuple.root']

# Backend parameters
j2.backend = NG()
j2.backend.check_availability = True
j2.backend.requirements.walltime=60
j2.backend.requirements.cputime=60
j2.backend.requirements.memory=1000

Remember to change the path that now says /YOUR/ANALYSIS/PATH/ to something more reasonable wink

Now, in ganga, run the file by saying execfile('ganga_hptv_job.py'). This saves you typing everything into ganga, and it lets you re-make complex jobs like this one quickly. Type j2 to get a printout of the new job object. Then submit it by saying j2.submit().

This submits 5 jobs to NorduGrid, each with one input file, reading 100 events from each file. While this runs, we'll start looking at one of the ntuples you produced above.

5.2 Submit grid jobs for your signal and background

Now make a job description file for your signal and background sets (i.e. change trig1_misal1_csc11.005403.SU3_jimmy_susy.merge.AOD.v12000605 to something else if needed), and send grid jobs for both of these. Stick to running only one input file per job and 100 events per job for now, so that the job goes fast.

While this runs, we'll do some more setup and start looking at an nuple.

6 - Install your own ROOT

The athena ROOT is rather old, and doesn't want to play with the HighPtView ntuples. Since it's very easy to install a newer version of ROOT, let's do so:

  1. cd to /home/scratch/USERNAME/
  2. Say =wget ftp://root.cern.ch/root/root_v5.17.02.Linux.slc4.gcc3.4.tar.gz=
  3. Say tar -zxvf root_v5.17.02.Linux.slc4.gcc3.4.tar.gz
  4. Download this file: setupROOT.sh
  5. Set up root by saying source setupROOT.sh (or just copy the contents of the file into your .bashrc file)
  6. Try saying root and look for the version in the welcome screen. It should be 5.17/02. (Or at least not 5.10/00e, which is the athena version...)
  7. Quit root by saying .q (Not the most obvious quit command I've seen...)

Note that we've had some problems also with the very latest root versions. The best is to download the source code from http://root.cern.ch/root/Version517.html and compile it yourself, but don't try this for this tutorial (it takes 1-2 hours...)

7 - Make a looper class to look at the ntuples

Ok, are you in the directory where you produced the file MuidTau1p3p_HighPtAANtuple.root earlier on? Good.

To begin with, just open the file with root: root MuidTau1p3p_HighPtAANtuple.root. Then open a browser: TBrowser tb; In the window, doubleclick ROOT files, then MuidTau1p3p_HighPtAANtuple.root. Now you see four trees - the one you want is FullRec0 which contains the fully reconstructed events. FastSim0 contains ATLFAST events (for comparison) and Truth0 contains truth information. Never mind the one called CollectionTree... Double-click FullRec0, and you get a (looooong) list of variables that can be plotted. Double-click El_E at the very top. A histogram pops up, and you have plotted the energy of all reconstructed electrons in your events. Congratulations.

But a full analysis needs to be more fancy than this. Let's see how. (Note: I will give you a somewhat generalized AnalysisLooper class with some examples in it below, but please start with making one for yourself. It's good to see how it's done.)

7.1 Make a looper

We will now use a tool in root called MakeClass, which auto-writes a tool to loop over all entries in the ntuple. One entry equals one event in our case, so the looper is very useful for doing analysis on an event-by-event basis. E.g. first check if an event has four leptons in it. If so, calculate their combined invariant mass and plot it. If not, move on to the next event.

This is how to do it:

  1. Start root: root
  2. Open the ntuple file: TFile *f = new TFile("MuidTau1p3p_HighPtAANtuple.root")
  3. Load the FullRec0 ntuple from the file: TTree *t = f->Get("FullRec0") (by the way, a tree is just a general type of ntuple...)
  4. Make your looper class: t->MakeClass("AnalysisLooper") (feel free to change the name AnalysisLooper if you want)

Root now auto-writes two files: AnalysisLooper.C and AnalysisLooper.h. All the dirty machinery is in the .h file, while the .C file only contains a function called Loop(), which is what you will call to loop over all the events.

One thing is missing - add the following to the other include statements at the top of AnalysisLooper.h:

#include <vector.h>

We need it because athena uses a lot of vector template objects...

7.2 Load and run the looper

Exit and re-start root. Then do the following:

.L AnalysisLooper.C
AnalysisLooper *a = new AnalysisLooper()
a->Loop()

In these steps you loaded the class into root, created a class object, and then called the loop function of the object.

Nothing happens. This is because Loop() is empty yet. Open AnalysisLooper.C in a text editor, find the line = // if (Cut(ientry) < 0) continue;= and just after it add this line:

cout << El_N << endl;

(before the two closing curly brackets). This tells Loop() to print out the number of electrons in the current event. Do the steps above again, and now you will see that Loop() does something.

8 - Generalise the looper to be able to read different ntuples, and get the grid output

The class you just made is pre-set to load and read just the one file you made it with. But we want it to read e.g. all the nice grid-produced ntuples that we're waiting for. AnalysisLooper can easily be generalized to read whatever files you want so long as they contain the same kind of ntuple. (In our case, an ntuple called FullRec0 with roughly the same branches in it.)

8.1 Download a premade AnalysisLooper

I have made such a generalized class for you - please download this file: AnalysisLooper.tar.gz. Then:

  1. Back up your AnalysisLooper, as it will get overwritten by the tar-file (unless you used another name...)
  2. Untar the file: tar -zxvf AnalysisLooper.tar.gz
  3. Look in the file runAnalysisLooper.C It's just a macro that loads AnalysisLooper, adds a file to it (one you should have in your analysis directory right now...) and then runs Loop().
  4. Run AnalysisLooper like this: root -q runAnalysisLooper.C
  5. You should now have gotten a file called AnalysisLooperOutput.root. Open this in root and look at the histograms in it. Then open AnalysisLooper.C and see what I did to make these plots and write them to file.
  6. Finally, look at the difference between your AnalysisLooper.h and mine, to see what I did to generalize the class. It's good to understand this for later reference.

8.2 Get the grid output and loop over it

By now your grid job(s) should have completed. You're now in a situation where you have a directory /home/scratch//USERNAME/gangadir/workspace/Local/JOBNUMBER that has 5 subdirectories (named 0, 1, 2 3, and 4). Each one contains a directory output and under there, you have MuidTau1p3p_HighPtAANtuple.root. It's useful to rename these so that they can be put in the same directory, and I've made a tool for this. Download fixTreeNames2.py.txt, rename it to fixTreeNames2.py (this is a twiki thing - it insists on adding the .txt...) and put it somewhere in your path (comment: do echo $PATH to find possible folders to place fixTreeNames2.py in. One possibility should be ~username/bin if it does not exist, create it.). Make it executable (chmod u+x fixTreeNames2.py) Then go to /home/scratch/USERNAME/gangadir/workspace/Local/JOBNUMBER and say

fixTreeNames2.py MuidTau1p3p_HighPtAANtuple.root

and you'll get some auto-renaming. Now you can make a data directory somewhere, e.g.

/home/scratch/USERNAME/data/trig1_misal1_csc11.005403.SU3_jimmy_susy.merge.AOD.v12000605

and move all the ntuple files here.

If you want the python script to also move all files to your designated data directory, you could modify the python script to take one more argument, namely the destination directory for your root files, and add a command to the end that actually moves, or if you wish copies, the files to the directory that you specified. The python file would then look like this:

#! /usr/bin/python

import os, dircache, string, sys

def printHelp():
    print "Usage: fixTreeNamesAndMove.py <rootfilename> <destination directory>"
    print "Run from ganga job number dir, e.g. gangadir/workspace/Local/105/"
    
    
if len(sys.argv)<3:
    printHelp()
    sys.exit()
        
if sys.argv[1]=="-h":
    printHelp()
    sys.exit()
else:
    fn = sys.argv[1]
    fdir=sys.argv[2]        
    print "Root file name: " + fn
    print "Destination directory: " +fdir

if not fn.endswith(".root"):
    sys.exit("Error - please give a filename of the type <name>.root")
                
l = dircache.listdir(".")
                
for d in l:
    try:
        int(d)
        lookslikeanint = 1
    except:
        lookslikeanint = 0
        
    if lookslikeanint:
        dir = d+"/output/"
        
        fnsplit = fn.split(".")
        fncore = fn[:-5]
            
        newfn = fncore+"_"+d+".root"
            
    if not os.path.exists(dir+fn):
        print "Error: Can't find file "+dir+fn
        continue
                            
    cmd = "mv "+dir+fn+" "+dir+newfn
    print cmd
    os.system(cmd)

    cmd = "cp "+dir+newfn+" "+fdir
    print cmd
    os.system(cmd)   



To execute it do for example:

fixTreeNames2.py MuidTau1p3p_HighPtAANtuple.root /home/scratch/USERNAME/data/trig1_misal1_csc11.005403.SU3_jimmy_susy.merge.AOD.v12000605/

Now you can add them all to the looper at the same time, by saying (e.g. in runAnalysisLooper.C):

a->AddFiles("/home/scratch/USERNAME/data/trig1_misal1_csc11.005403.SU3_jimmy_susy.merge.AOD.v12000605/*root");

Run the looper, and watch it cover all the 500 events you made - 100 in each of the 5 files.

9 - A simple analysis

Now we have all that we need for a simple analysis:

  • a signal dataset in ntuple form
  • a background dataset in ntuple form
  • a looper that can read either set, and make plots from them

Let's make a very simple analysis. The components are already included in the AnalysisLooper.C you downloaded above.

9.1 Trigger cut

In real data, not all events will be saved. They need to satisfy some trigger first, so if you want to perform an analysis you first need to make sure your events are triggered on. Let's make it simple first, and only require the events to have one identified electron with a transverse momentum of 25GeV or more, and that it's isolated. This means that there are no other particles in a cone of a certain width around the electron (i.e. it's unlikely to be an outlying part of a jet). To do this, we require

     // Check a trigger - if not passed, move on to next event
      // Trigger is 1 for passed, 0 for not passed...
      if (PassedEF_e25i==0)
        continue;

To see all possible triggers, look at the variable list in AnalysisLooper.h, or look at the ntuple file itself with a TBrowser. For more information on the trigger, see this reference:

9.2 Plot EtMissing, or calculate Meff or an invariant mass

Next, find some variable that gives an indication of the physics you want to study. For Higgs, this might be the invariant mass of four leptons, for SUSY it may be the effective mass. If needed, calculate the quantity you need. The most simple calculation of an effective mass would be

Double_t meff_all = MissingEt;
for (Int_t i = 0; i<Jet_C4_N; i++) meff_all += (*Jet_C4_p_T)[i];

(this isn't the best definition of Meff, but I'll leave that for later...)

In my AnalysisLooper, I plot MissingEt. Use that histogram for now.

9.3 Set cuts

Want to set some more cuts, e.g. require high-pT electrons, jets, event sphericity, whatever? Put it in AnalysisLooper smile

9.4 Scale to the right integrated luminosity, overlay histograms

The last, very important, thing for now is that you must scale the histograms to the same integrated luminosity if you want to compare signal to background. Sure, we've analysed 500 events of each right now, but how many events would we expect in some given number of inelastic p+p collisions? To know, you need to specify an integrated luminosity to compare for and to know the cross sections of each of your signals and backgrounds.

For a discussion on how to do this normalization, see appendix D below.

An integrated luminosity of 1 fb^(-1) (i.e. one inverse femtobarn) is a good choice.

A good starting point for finding cross sections for various processes is the SUSY working group MC production monitor page. This page, and some sub-pages, lists the numbers for all samples relevant for SUSY. The Higgs group probably has a similar page somewhere - can the first person to find it add it here?

10 - The way forward

Now you have all the tools you need. The next step is to use it all for a real analysis. Tips, for the case of doing a Higgs or SUSY search:

  • Change the athena job file to run on all input files (j2.inputdata.number_of_files=0) and all events in every file (j2.application.max_events=-1)
  • Run over a full signal set
  • Run over one or more background sets (in the end, run over all backgrounds relevant to your analysis smile
  • Analyse each of the sets, using the same cuts, triggers and event selections
  • Plot your main variable, i.e. invariant mass (higgs), Meff or EtMiss (SUSY) or whatever, for the signal
  • Sum up all backgrounds, plot on the same graph
  • NB: Remember to scale to the same total integrated luminosity
  • Is your signal above the background? Can the signal to background be improved by setting better cuts? Re-run analysis with better, more intelligent cuts wink
  • Show results to Farid, get approval.
  • Write thesis.
  • Party.

Appendix A: Understanding HighPtView

From central ATLAS production we get AOD files. AOD = Analysis Object Data. The AOD is a set of containers with various information, like electron data, muon data, trigger data etc. We want a data format that lets us

  • access all the information for a single event
  • do some processing, e.g. calculate invariant masses, and fill this in a histogram, for this single event
  • move on and do the same to the next event

The AOD is not very suitable for this. Enter HighPtView, which under the hood is just a configuration of the package EventView. It does the following:

  • Some overlap removal, i.e. make sure that every object that has been found in an event is given only one interpretation. It's either an electron or a muon, never both.
  • Some cuts, like only letting through electrons with pT > 15 GeV. This is realistic, as the ATLAS reconstruction will be very inefficient for softer (i.e. lower energy) particles than this.
  • Convert the AOD to an ntuple format, or more formally a ROOT tree, which has exactly the properties listed above.

To learn about ROOT trees, read the ROOT users guide chapter 12 and the online ROOT documentation.

One more thing: If we start talking about TopView or SUSYView, know that they are just extras built on top of HighPtView. The hierarchy is like this:

  • AOD: The base data format that we all have to access.
  • EventView: A big package of utility classes designed to read AODs and produce ntuples.
  • HighPtView: A configuration of EventView that has been agreed on by the ATLAS collaboration. There will be HighPtView-files centrally produced, but it is very useful to be able to re-run it yourself.
  • SUSYView: Add-on to HighPtView that calculates some SUSY-specific variables (sphericity, effective mass).
  • TopView: Ditto for top and Higgs physics. Has a lot of machinery to calculate invariant masses.

We will use just HighPtView for now - it's easy to go on to MySpecialView later on.

Appendix B: GANGA info

For a full GANGA/ATLAS tutorial, geared towards CERN users, see GangaTutorial44 by Johannes Elmsheuser. From his tutorial:

A few hints how to work with the Ganga Command Line Interface in Python (CLIP), after you have started Ganga with: ganga

  • The repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/Local/
  • All commands typed on the GANGA command line can also be saved to single script files like mygangajob1.py, etc. and executed with: execfile('/home/Joe.User/mygangajob1.py')
  • The job repository can be viewed with: jobs
  • Subjobs of a specific job can be view with: subjobs jobid (e.g. jobid=42)
  • A running or queued job can be killed with: jobs(jobid).kill()
  • A completed job can be removed from the job repository with: jobs(jobid).remove()
  • The job output directrory of finished jobs that is retrieved back to the job repository can be viewed with: jobs(jobid).peek()
  • The stdout log file of a finished job can viewed with: jobs(jobid).peek('stdout', 'cat')
  • Export a job configuration to a file with: export(jobs(jobid), '~/jobconf.py')
  • Load a job configuration from a file with: load('~/jobconf.py')
  • Change the logging level of Ganga to get more information during job submission etc. with: config["Logging"]['GangaAtlas'] = 'INFO' or config["Logging"]['GangaAtlas'] = 'DEBUG'

There is also a full ganga+NorduGrid tutorial available here:

Distributed analysis on NorduGrid using GANGA at the 2007 ATLAS PAT Workshop in Bergen

Appendix C: Fancy stuff with loopers

Once you have a tree or ntuple and a looper class, you can do many nice things. Here are some tips that might come in handy:

  • For each event, there are entries that contain only a number, e.g. MissingEt, and entries that are arrays, e.g. the energy of all electrons in an event. In HighPtView, the arrays are of type vector. To access and loop over these, the syntax is like (*El_E)[0] where El_E is the electron energy array and [0] is the first entry.
  • In HighPtView, all particle arrays (electrons, muons, jets etc.) are ordered by their p_T, from highest to lowest.
  • Do you have a function that needs to be called for every event? Try putting it in a file MyFunction.h and then just include this file in AnalysisLooper.h like this: #include "MyFunction.h". Otherwise AnalysisLooper can get very big and messy.
  • Looper not looping fast enough? Try compiling it. Root can do this for you, if you load the class like this: gROOT->LoadMacro("AnalysisLooper.C++");. This will probably require some cleanup of the code since the premade AnalysisLooper usually doesn't compile at once, but it's a useful exercise...

If you want to see a full-blown analysis framework (under development, but still), download this file: HighPtSusyViewLooper.tar.gz. It contains my analysis looper for my 0-lepton di-jet analysis. It's a messy file with a lot of functions and calls, but it may give some useful tips. Note that I've changed it to a compilable program with a main() function at the top of the .cpp file.(NB: You won't be able to compile it since it depends on some numerical libraries that I haven't included.)

Appendix D: Events, luminosities and cross sections

A given physics process has a certain cross section, i.e. probability of happening, in a given type of collision. Example: In p+p collisions at 14TeV, the mSUGRA model SU3 has a total cross section of 18.59 pb (picobarn, i.e. 10^(-12) barn).

At the same time, we usually specify the amount of data an experiment has collected in terms of 'integrated luminosity', given in 'inverse picobarn' or 'inverse femtobarn'.

Here is how this ties together:

The formula for calculating the number of events of one type in a sample is

N = s * L

where N is the number, s (for sigma) is the cross section, and L is the integrated luminosity. Since [s] is barn, [L] must be 1/barn, or inverse barn.

For p+p collisions at 14TeV, the total cross section is approx. s_PP = 100mb = 0.1b. This means that for an 'integrated luminosity of 1 inverse femtobarn' we will have seen

N = 0.1 b * 1 fb^(-1) = 10^(14) fb * 1 fb^(-1) = 10^(14)

events in total. The term 'integrated luminosity' is therefore just a way to specify the total number of p+p events the experiment has seen.

..but, out of these 10^(14) events, how many will be mSUGRA SU3 events? (Assuming this model is true...) We calculate this in the same way:

N = 18.59 pb * 1 fb^(-1) = 18589 fb * 1 fb^(-1) = 18590

So there will be 18590 SU3 events. The number for other SUSY or background samples will be different.

So, an important point:

If you want to compare a signal to a background, you must normalize each histogram to the same integrated luminosity. 1 fb^(-1) is a good starting point. So if you have analyzed exactly 18590 events, you don't need to do anything. If you have 5000 events, you must scale your histogram up by (18590/5000). The same goes for the backgrounds.

Appendix E: Collected links

Here are all the links provided here and there in the above tutorial:

-- Main.bjornhs - 13 Sep 2007

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2010-10-12 - JohannesElmsheuser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback