Main Web>TWikiUsers>EmanueleDiMarco>CMGMonojetAnalysisTools (2018-10-18, EmanueleDiMarco)

EditAttachPDF

Monojet Analysis Code

Contents:

Monojet Analysis Code

The code to run the Dark Matter searches

The CERN Analysis Framework is based on the so called CMG Tools.

The git repository containing the CMG Tools can be found here: git repository.

In particular the CMG Tools code is located in this two packages. The core is in the "heppy" packages (High Energy Physics in Python, since most of the analyzers are in python) Heppy and HeppyCore.

The specific analysis code is in CMGTools/MonoXAnalysis, where the current branch is for release CMSSW_7_6_3 and it is: 76X

How to get the CMG Tools MonoX Analysis code

In order to get the MonoXAnalysis code and use it you must follow the following instructions:

Git MiniAOD release for post-ICHEP 2016 (CMSSW 8_0_19)

This release in 80X splits out the CMGTools subsystem from CMSSW, while Heppy remains in CMSSW.

Installation instructions:

New users should subscribe to github and set it up for CMSSW and fork the cmg-cmssw repository
Fork the new cmgtools-lite repository from https://github.com/CERN-PH-CMG/cmgtools-lite
Then, set up your working are on AFS

Show GIT recipe

Hide

# setup CMSSW and the base git
cmsrel CMSSW_8_0_19 
cd CMSSW_8_0_19/src 
cmsenv
git cms-init

# add the central cmg-cmssw repository to get the Heppy 80X branch
git remote add cmg-central https://github.com/CERN-PH-CMG/cmg-cmssw.git  -f  -t heppy_80X

# configure the sparse checkout, and get the base heppy packages
cp /afs/cern.ch/user/c/cmgtools/public/sparse-checkout_80X_heppy .git/info/sparse-checkout
git checkout -b heppy_80X cmg-central/heppy_80X

# add your mirror, and push the 80X branch to it
git remote add origin git@github.com:YOUR_GITHUB_REPOSITORY/cmg-cmssw.git
git push -u origin heppy_80X

# now get the CMGTools subsystem from the cmgtools-lite repository
git clone -o cmg-central https://github.com/CERN-PH-CMG/cmgtools-lite.git -b 80X CMGTools
cd CMGTools 

# add your fork, and push the 80X branch to it
git remote add origin  git@github.com:YOUR_GITHUB_REPOSITORY/cmgtools-lite.git 
git push -u origin 80X

# download the files for the latest e/gamma corrections
# download the txt files with the corrections
cd EgammaAnalysis/ElectronTools/data
# corrections calculated with 12.9 fb-1 of 2016 data (ICHEP 16 dataset).
git clone -b ICHEP2016_v2 https://github.com/ECALELFS/ScalesSmearings.git

# download the package for the laters muon Kalman Filter corrections
cd $CMSSW_BASE/src
git clone -o Analysis https://github.com/bachtis/analysis.git -b KaMuCa_V4 KaMuCa

# download the MELA package for ME calculations
git clone https://github.com/emanueledimarco/HiggsAnalysis-ZZMatrixElement.git ZZMatrixElement
(cd ZZMatrixElement ; git checkout from-v200p5 ; . setup.sh -j 12)


#compie
scram b -j 8

Description step 1: from MiniAOD to flat trees with the MonoXAnalysis python package

Standard configuration

The standard configuration file to run the MonoXAnalysis python code and produce the trees is the following: run_monojet_cfg.py

Description:

the cfg runs a cfg.Sequence of analyzers. The main ones are defined in the dmCoreSequence, defined in CMGTools/MonoXAnalysis/python/analyzers/dmCore_modules_cff.py
- skimAnalyzer: it runs the CMGTools/RootTools/python/skimAnalyzerCount.py to store in the SkimReport.txt (see next section) the number of events that have been processed
- jsonAna: it runs the CMGTools/RootTools/python/JSONAnalyzer.py it to apply the json file skim for data (NB: json-files for each dataset are defined here: to-be-filled with the first 13 TeV data JSONS)
- triggerAna:it runs CMGTools/RootTools/python/analyzers/triggerBitFilter.py to apply the trigger filters (NB: the trigger bits to be requested for each dataset are defined here: triggers_monojet.py)
- pileUpAna:it runs CMGTools/RootTools/python/analyzers/PileUpAnalyzer.py to compute the pile-up reweighting and store the weights in the trees
- vertexAna: it runs CMGTools/RootTools/python/analyzers/VertexAnalyzer.py to store in the trees the vertices information
- lepAna: applies lepton cross cleaning, muon ghost cleaning, basic and advanced electron and muons selections, energy/momentum calibration/regression/smearing, and to store in the trees the leptons information
- photonAna: applies the predefined selections on photon candidates and stores in the trees the leptons information
- tauAna: it runsthe tau identifications and store in the trees the tau information
- jetAna: it applies jet selections and clean the jets collections from the leptons, can ally jet JEC on the fly with a different GT than the one of the miniAOD
- metAna: it runs several mets (std PF met, metNoMu, metNoEle, metNoPU)
- monoJetSkim: it runs the std skim for monojet analysis, i.e. metNoMu>200 GeV. This cut is defined in the cfg
- treeProducer: it runs CMGTools/MonoXAnalysis/python/analyzers/treeProducerDarkMatterMonoJet.py which produces the flat tree with the final variables

Trees description

Content of the directories:

The list of samples processed are described here: Monojet analysis twiki.
For each sample there is one directory i.e. for the signal we have: /cmshome/dimarcoe/TREES_210415_MET200SKIM/Monojet_M_10_V
The corresponding .root file of the tree can be found here: /cmshome/dimarcoe/TREES_210415_MET200SKIM/Monojet_M_10_V/treeProducerDarkMatterMonoJet/tree.root
In each sample directory there is also a lot of other stuff: the most useful thing is the /cmshome/dimarcoe/TREES_210415_MET200SKIM/skimAnalyzerCount/SkimReport.txt file which reports the number of events on which the tree production run, useful for the event weight
Additional directories such as mjvars could exist, containing sets of friend trees that will be described in the next section

Weights and filters: the friend trees contains some useful event weights already:

weight cross section weight, scaled for 1/fb. It uses the xsec value in the tree, and it is computed as xsec[fb] * genWeight / sum(genWeights)
events_ntot is the sum(genWeights) in the case of weighted samples (like madgraph, which have positive and negative weights) or sum of processed events in the other cases
vtxWeight an estimate of the pileup weight, done with just the number of vertices after a loose selection on Z#mu#mu events in 2.11/fb
MET filters with event lists: uses the latest provided on 2.11/fb by the MET scanners, and they are 2: ecalfilter and cscfilter. The others from the miniAOD are in the std trees and are: hbheFilterNew25ns and hbheFilterIso
triggers: no triggers are required for MC, while the OR of monojet triggers are required to make the data trees on MET dataset. For the DoubleMuon / DoubleEG datasets the OR of DoubleMu / DoubleEG have been requested. The single bits can be required by using the variables in the trees (booleans)
JSON the latest golden JSON file is used (see below)

Tree location

Depending on the skim, the trees and friend trees are from 10-100 GB, and are located either on EOS or on Rome T2. There are 3 set of trees, with different skims applied. Only relevant data datasets are done for each skim.

skim	public AFS dir	!EOS at CERN	Rome T2
>= 1e or 1μ loose	`/afs/cern.ch/work/e/emanuele/TREES/TREES_25ns_1LEPSKIM_76X/`	`/eos/cms/store/cmst3/group/susy/emanuele/monox/trees/TREES_25ns_1LEPSKIM_76X`	`/pnfs/roma1.infn.it/data/cms/store/user/emanuele/monox_trees/TREES_25ns_1LEPSKIM_76X/`
>= 1lep tight and 1 jet pT>100 GeV	`/afs/cern.ch/work/e/emanuele/TREES/TREES_25ns_1TLEP1JETSKIM_76X/`	`/eos/cms/store/cmst3/group/susy/emanuele/monox/trees/TREES_25ns_1TLEP1JETSKIM_76X`	`/pnfs/roma1.infn.it/data/cms/store/user/emanuele/monox_trees/TREES_25ns_1TLEP1JETSKIM_76X/`
metNoMu>200 GeV	`/afs/cern.ch/work/e/emanuele/TREES/TREES_25ns_MET200SKIM_76X/`	`/eos/cms/store/cmst3/group/susy/emanuele/monox/trees/TREES_25ns_MET200SKIM_76X`	todo
>=1 high pT γ	todo	todo

The MC and data are done with 76X re-reco (Fall15 and Dec16, respectively).
The JSON used to make the trees is the golden one for pp running (2.32fb-1)

How to produce the trees running the configuration files

for testing the cfg (it runs on only on 500 events of one component, output dir Trash )

heppy Trash run_monojet_cfg.py -N 500 -o single -o test=synch-80X -o sample=TTbarDM -v -t

It creates a Trash directory with one tree-directory with the name of the component which has run (which is chosen in the cfg file for test = 1). It runs over 1000 events.

for running on all the Monte Carlo samples (it runs on all selectedComponents, output dir MCTrees) (change emanuele with your cern username)

# set runData = getHeppyOption("runData",False)
heppy_batch.py -o MCTrees  run_wmass_cfg.py  -b 'bsub -q 2nd -u emanuele -o std_output.txt -J heppymc < batchScript.sh'

using condor:

 
heppy_batch.py run_wmass_cfg.py -o MCTrees -r /store/cmst3/group/wmass/w-helicity-13TeV/ntuplesRecoil/TREES_DY_1l_recoil_condor/MCTrees/ -b 'run_condor_simple.sh -t 3000 ./batchScript.sh'

for running on the data samples:
- update the json file in the cfg run_monojet_cfg.py
- check that the run ranges for the different pieces of the prompt reco and rerecoes are OK
- remove the cache, if you already run on a sub-piece of a growing dataset (rm ~/.cmgdataset/*Run2015D*)
- check the trigger lists and veto triggers for each dataset in "DatasetsAndTriggers"

# set runData = getHeppyOption("runData",True)
heppy_batch.py -o DATATrees  run_monojet_cfg.py  -b 'bsub -q 2nd -u emanuele -o std_output.txt -J heppydata < batchScript.sh'

It creates a Output directory which contains several tree-direcories. The number of tree-directories for each component depends on the component splitFactor which is define in samples_8TeV.py

. At this point two scripts are available to check that all chunks terminated correctly and to add the root files (and also all cut flow counters and averages):

deep check the output trees, and print the command to resubmit the failed chunks

cd MCTrees
cmgListChunksToResub -t treeProducerDarkMatterMonoJet/tree.root -z ; cmgListChunksToResub -t JSONAnalyzer/RLTInfo.root -z

A useful script, cmgCheckProduction.sh, is already available in $CMSSW_BASE/src/CMGTools/MonoXAnalysis/scripts/. You can use it to run the previous commands. By using the -s option, it will save the output of the previous commands in a bash script for later usage. You can resubmit failed chunks any number of time if some of them keep failing, but you can allow to have few failed chunks for a given sample in MC (how many depends on the total number). The option -r makes the script write the commands to remove bad chunks.

hadd the output files, and move the chunks outside the output directory, to be eventually removed

haddChunks.py -c -r MCTrees

Before using this script, if you fear you might lack space on AFS to store the merged trees (hadd command does not remove the chunks), you should log into a local pc at cern, for instance ssh pccmsrmXYZ, then create symbolic links to directories on AFS using the linkChunks.sh script. If you create a MCTrees directory on the local pc, go inside it and then use:

linkChunks.sh /afs/<Path_to_MCTrees_on_AFS>/MCTrees

Now you can safely use the haddChunks.py (provided you have enough space in the local pc). Remember that before hadding files, there must be no empty directories: if there were some failed chunks you didn't want to resubmit, remove them.

WARNING: Once you have merged all the trees, make sure that all of them are present. In case there were some samples not splitted in chunks, they will be left untouched by haddChunks.py so that, at the end, only the link to the one on AFS will be present in the local pc. If this is the case, remember to copy these files from AFS to the local pc before removing all the directories on AFS.

if the trees are too large to stay on AFS, it is preferable to copy them on EOS, and then leave only the structure of the directories and the links to the files in place of the root file (eg to make friend trees using lxbatch). To do this, first run the archival on EOS. Get just outside the directory whose content you want to copy and use the following command

$CMSSW_BASE/src/CMGTools/MonoXAnalysis/scripts/archiveTreesOnEOS.py -t treeProducerDarkMatterMonoJet <dir_to_copy>/ /eos/cms/<PATH_TO_EOS>/<destination_dir>

This will copy all the files inside dir_to_copy in destination_dir. You will be shown which files will be copied and their destination path and the script will ask for confirmation that you want to proceed.

For example, the following command

$CMSSW_BASE/src/CMGTools/MonoXAnalysis/scripts/archiveTreesOnEOS.py -t treeProducerDarkMatterMonoJet TREES_25ns_MET200SKIM_76X/ /eos/cms/store/cmst3/group/susy/emanuele/monox/trees/

will copy the content of TREES_25ns_MET200SKIM_76X/ inside trees/.

then copy the structure, including the .url files to AFS, but exclude the copy to AFS of the root files. You can choose whatever path you like on afs (it might be your public area, so that everyone can use your trees):

rsync -av --exclude '*.root' <LOCALDIR_WITH_TREES> <username>@lxplus:<PATH_ON_AFS>

In the above command, the path to AFS points to a directory inside which the structure of the local directory will be copied. It is good practice, although not necessary, to name this AFS directory as the local one, just to make it easier to remember what it is.

Using AAA for datasets not at CERN T2

If a dataset is not at CERN, it can be transferred there using the git issues here

. If the sample needs to be accessed just few times, it is more convenient to use AAA (a bit slower, but avoiding the transfer). To do this, you need a valid grid proxy:

voms-proxy-init --voms cms --valid 168:00
mv /tmp/x509up_<xxxx> <your_path_on_afs>/X509_USER_PROXY
setenv X509_USER_PROXY <your_path_on_afs>/X509_USER_PROXY

and then the code will be able to access datasets outside CERN automatically. The third command is for tcsh: if you have bash, then the right command is the following:

export X509_USER_PROXY='<your_path_on_afs>/X509_USER_PROXY'

Adding friend trees

Friend trees are a technique in root to add a variable to an existing tree by creating a second tree that just contains the value of that variable for each entry of the main tree. We use friend trees to add variables that are too complex to compute on the fly from the flat tree (e.g. because they require looping on the objects in an event), and that are still in development and so we don't have them yet in the final trees.

For convenience, the method we do to create friend trees is to have small python classes that compute the values of the friend tree variables, and two main python scripts that take care of running those classes on all the trees, of the book-keeping and so on.

For the final kinematic variables we use a driver script that runs both on data and on MC, macros/prepareEventVariablesFriendTree.py. Example python classes that we use for this are:

Kinematic variables of dilepton events such as the delta phi distance of the leading two jets (in case of 2 jets are present) python/tools/eventVars_monojet.py

To run on MC, the following line in prepareEventVariablesFriendTree.py must be uncommented:

MODULES.append ( ('puWeights',VertexWeightFriend(pufile_mc,pufile_data,"pu_mc","pileup",name="puw",verbose=True,vtx_coll_to_reweight="nTrueInt") ) )

This will produce the puw variable in the friend trees. When you run on data, you must comment it again, otherwise the code will complain that nTrueInt is not present in the main trees.

The corresponding directories of friend trees can be found here for the latest set of trees:

XXXXXXX/FRIENDS_EVENTVARS

-->

NB: additional per-event weights have been computed during the original tree production and are stored in the main trees such as: puWeight (weight for pile-up reweighting), LepEff _2lep (weight for lepton preselection data/mc SF in di-lepton final state), Eff_3lep (weight for lepton preselection data/mc SF in three-lepton final state). How to apply event weights will be discussed in the following.

An example of how to run the prepareEventVariablesFriendTree.py script, that will list the bsub commands needed to produce the friend trees for each sample, is given here:

mkdir {path to old trees}/TREES_MONOJET/FRIENDS_EVENTVARS
python prepareEventVariablesFriendTree.py -q 8nh -N 25000 {path to old trees}/TREES_MONOJET {path to old trees}/TREES_MONOJET/FRIENDS_EVENTVARS

Here {path to old trees}/TREES_MONOJET is the global path (e.g. /afs/.../user/.../directory_with_samples) to the directory where the folders with trees are stored, while FRIENDS_EVENTVARS is the name of the directory where friends will be stored (N.B.: you MUST CREATE this directory, the code won't do that by itself).

You might redirect the output of the previous commands in a script file to submit the jobs. This output consists of some information (like the number of chunks created for each sample) that you should remove from the script before using it. The remaining part is the actual commands to submit jobs, which looks like:

bsub -q <queue> $CMSSW_BASE/src/CMGTools/MonoXAnalysis/macros/lxbatch_runner.sh $CMSSW_BASE/src/CMGTools/MonoXAnalysis/macros $CMSSW_BASE python prepareEventVariablesFriendTree.py -N 25000 -T 'mjvars' -t treeProducerDarkMatterMonoJet {path to old trees}/TREES_MONOJET/ {path to old trees}/TREES_MONOJET/FRIENDS_EVENTVARS/ --vector  -d <sample_name> -c <number>

where the -c option followed by a number identifies a specific chunk. It is highly recommended to perform a test in local before using the queues: to do a test you only need to select one command line and copy the part from "python prepareEventVariablesFriendTree.py [...]".

If you used the rsync command to create the directory structures on AFS (see above), you can set {path to old trees}/TREES_MONOJET to that path. Then, after friends are created, you might either leave them on AFS (they should not be very big files) or copying them manually on EOS.

The "-N 25000" option (without inverted commas) splits the job into 25k events/job. Same command, but using the prepareScaleFactorsFriendTree.py script, can be used to produce sfFriend trees (friend trees with scale factors to be used in MC). Both scripts can be found in the macros directory (N.B.: the command MUST be launched from inside macros).

To check that all the chunks run correctly, go inside the directory containing the friend root files and run the script

scripts/friendChunkCheck.sh -z <prefix>

where prefix is evVarFriend or sfFriend. Option -z is optional but useful because it will test the presence of zombies.

To merge the chunks, run the script (from the same directory as above)

TTHAnalysis/macros/leptons/friendChunkAdd.sh <prefix>

* To copy manually friends in a directory on EOS you can do the following: get inside the directory where friends are stored (likely there will be the chunk as well) and use these commands.

files=`ls | grep -v chunk`
for file in $files; do cmsStage -f $file <eos_path_to_dir_with_rootfiles>; done

cmsStage is an old command to copy file on EOS (you could use eos cp instead). The path to EOS must start with /store when using cmsStage. Option -f is to force overwriting of already existing files (so be careful if you have two files with same name).

Description step 2: from trees to yields and plots

For the final steps of the analysis, such as computing yields, making plots or filling datacards, we use the python scripts in /python/plotter/.

Computing event yields and cut flows
The script to compute the yields is called mcAnalysis.py and takes as input:

a text file with the list of MC and data samples, e.g. as in mca.txt
- the first column is the name you want to give to the sample (e.g. TTW); the data must be "data", and samples derived from data applying FR or similar should have a "data" in their name.
  a plus sign at the end of the name means it's a signal.
- the second column is the name of the dataset, i.e. the directory containing in the trees. You can group multiple datasets in the same sample.
- the third column, only for MC, is the cross section in pb (including any branching ratio and filter efficiencies, but no skim efficiency)
- the fourth column, optional, is a cut to apply
- then, after a semicolon, you can give labels, plot styles and normalization uncertainties, data/mc corrections, fake rates, ...
a text file with a list of cuts to apply, e.g. as in bins/3l_tight.txt
- the first column is a name of the cut, to put in the tables
- the second column is the actual cut (same syntax as TTree::Draw; you can also use some extra functions defined in functions.cc)

You normally can have to specify some other options in the command line:

the name of the tree: --tree ttHLepTreeProducerTTH (which is the default, so you normally don't need it)
the path to the trees (e.g. -P /afs/cern.ch/work/g/gpetrucc/TREES_270314_HADD or better to your copy on a fast local disk )
--s2v which allows cut files to have the variables of the objects written as if they were scalars (e.g. LepGood1_pt) while the trees have variables saved as vectors (e.g. LepGood_pt[0]) (s2v stands for scalar to vector).
the luminosity, in fb^–1 (e.g. -l 19.6 )
the weight to apply to MC events (e.g. -P 'puWeight*LepEff_2lep' to apply PU re-weight and efficiency re-weight for the first two leptons)

Options to select or exclude samples:

-p selects one or more processes, separated by a comma; regular expressions are used (e.g. -p 'ttH,TT[WZ],TT' to select only signal and main backgrounds)
--xp excludes one more processes, separated by a comma (e.g. --xp 'data' to blind the yields)
--sp selects which processes are to be presented as signal; if not specified, the ones with a "+" in the samples file are the signals; (e.g. use --sp WZ in a control region targeting WZ)
--xf excludes one dataset, (e.g. to skip the DoubleElectron and MuEG PD's do --xf 'DoubleEle.*,MuEG.*' ) ("f" is for "files")

Options to manipulate the cut list on the fly (can specify multiple times):

-X pattern removes the cut whose name contains the specified pattern (e.g. -X MVA will remove the 'lepMVA' cut in the example file bins/3l_tight.txt)
-I pattern inverts a cut
-R pattern newname newcut replaces the selected cut with a new cut of which it specifies the name and cut (e.g. =-R 2b 1b 'nBJetLoose25 = 1' will replace the request of two b-jets with a request of one b-jet)
-A pattern newname newcut adds a new cut after the selected one (use "entry point" as pattern to add the cut at the beginning).
Newly added cuts are not visible to option -A, so if you want to add two cuts C1 C2 after a cut C0, just do -A C0 C1 'whatever' -A C0 C2 'whatevermore' and you'll get C0 C1 C2.
-U pattern reads the cut list only up to the selected cut, ignoring any one following it
--n-minus-one will present, instead of the cut flow, the total yields after all cuts and after sets of N-1 cuts.
pedantic note: The options are processed in this order A U I X R

Presentation options:

-f to get only the final yields, not the full cut flow (this also speeds up things, of course)
-G to not show the efficiencies
-e to show the uncertainties in the final yields ("e" for "errors")
-u to report unweighted MC yields (useful for debugging)

Other options:

-j to specify how many CPUs o use; 3-4 is usually ok for normal disks or AFS, you can go up to 8 or so with good SSD disks.

Example output:

$ python mcAnalysis.py -P /data1/emanuele/monox/TREES_040515_MET200SKIM --s2v -j 6 -l 5.0 -G   mca-Phys14.txt --s2v  sr/monojet.txt   -F mjvars/t "/data1/emanuele/monox/TREES_040515_MET200SKIM/0_eventvars_mj_v1/evVarFriend_{cname}.root" 

     CUT           M10V        Top       GJets     DYJets      WJets      ZNuNu     ALL BKG
-------------------------------------------------------------------------------------------
entry point          2042      83584       3672      17163     206013      80738     391172
2j                   1089      13293       1993      11297     134831      55227     216643
pt110                1052      11172       1870      10704     127583      52315     203645
dphi jj            892.22       6905       1392       8971     105320      44903     167494
photon veto        892.22       6905       1392       8971     105320      44903     167494
lep veto           885.29       1414     677.67     576.30      33215      44533      80417
met250             596.73     440.60     223.33     166.42      10933      17822      29586
met300             408.85     177.69     100.41      59.35       4158       8064      12561
met400             206.50      43.75      20.90      13.75     852.12       2148       3079
met500             110.87      14.99       6.88       4.20     243.72     727.21     997.00

Making plots
The script to compute the yields is called mcPlots.py and takes the same two inputs text files as mcAnalysis.py plus a third file to specify the plots (e.g. see standard-candles/zjet-plots.txt

the first column is the plot name, which will also be the histogram name in the output rootfile and the filename for the png or pdf images
the second is the expression to plot (again, you can use the extra functions in functions.cc); if you have colons in the expression, e.g. to call TMath::Hypot(x,y) you should escape them with a backslash ( TMath\:\:Hypot)
the third column is the binning, either as nbins,xmin,xmax or as [x0,x1,...,xn].
then you have options like labels (XTitle, YTitle), location of the legend (TL for top-left, TR for top-right), axis ticks (NXDiv), log scale (Logy).
For plots with uneven binnings, in the options you can put "Density=True" in these options to have the bin values correspond to event densities rather than event counts (ie so that a uniform distribution gives a flat histogram whatever is the binning)

Besides all the options of mcAnalysis.py you usually want also to specify:

--print=png,pdf to produce plots in png and pdf format (otherwise they're only saved in a rootfile)
--pdir some/path/to/plots to specify the directory where to print the files
-f you normally want this option to just produce the plots after all the cuts; otherwise, it will produce also additional sets of plots at each step of the selection.

Other useful options

--sP to select which plots to make from the plot file instead of making all of them
-o to specify the output root file (normally produced in pdir, and called as the plot file if the option is not specified)
--rebin rebins all the plots by this factor (or a smaller one if needed to have it divide correctly the number of bins)
--showRatio adds a data/mc ratio
--showSigShape draws also an outline of the signal, normalized to the total MC yield; the signal is also included in the stack, unless the option --noStackSig is also given
--showSFitShape draws an outline of the "signal"+background in which the "signal" is scaled so that the total ("signal"+background) normalization matches the data; this is useful mainly in control regions, together with --sp to define what is the "signal"
--plotmode can be used to choose to to produce instead of the stacked plots some non-stacked outlines normalized to the yield ( --plotmode=nostack) or normalized to unity ( --plotmode=norm)

Application of data/sim scale factors from friend trees

As described previously, while the trees already contain the reweighting factors for the base lepton selection, additional scale factors to the simulation are computed afterwards as friend trees. The main macro that computes the scale factors is macros/prepareScaleFactorsFriendTree.py, which uses classed defined under python/tools.

The tree with the scale factors are located in the friends directory within the main directory of the trees, and can be attached with the option
--FM sf/t /full/path/to/trees/friends/sfFriend_{cname}.root
where FM means 'friend for MC only', sf/t is the name of the directory and tree within the file, and sfFriend_{cname}.root is the pattern of the file name (the framework will replace {cname} with the name of the component, i.e. of the directory)

Currently, the following scale factors are provided, and can be added in the expression passwd to the W option:

SF_Lep{TightLoose,Tight}: scale factor for the lepton working points. SF_LepTightLoose is intended to be applied to 2l control regions, while SF_Tight in the 1l control regions (both e,μ)
SF_BTag ND-provided reweighting for the CSV discriminator (with 4 pairs of systematic variations will be added later: SF_btagRwt_{JES,LF,Stats1,Stats2}{Up,Down})
SF_trig1lep: scale factor for the single lepton trigger (e and μ are present, but these have to be used for 2e and 1e control samples only, since the 2μ and 1μ use the METNoMu triggers)
SF_trigmetnomu: scale factor for the trigger METNoMu. To be used for the signal selection, 2μ and 1μ control regions
SF_NLO: a weight to apply NLO-LO k-factors dependent on the pT of the W and Z. This means that the x-sec in the MCA files, like python/plotter/monojet/mca-74X-Vm.txt have to be the LO ones

Note that the scale factors for lepton efficiencies are appropriate for samples that have prompt leptons, not for samples with fakes.

An example for the 2μ selection would be:

$ python mcAnalysis.py monojet/mca-74X-Vm.txt -P /data1/emanuele/monox/TREES_25ns_MET200SKIM_1DEC2015 --s2v -j 8 -l 2.215 -G monojet/zmumu_twiki.txt -F mjvars/t "/data1/emanuele/monox/TREES_25ns_MET200SKIM_1DEC2015/friends/evVarFriend_{cname}.root" --FM sf/t "/data1/emanuele/monox/TREES_25ns_MET200SKIM_1DEC2015/friends/sfFriend_{cname}.root" -W 'vtxWeight*SF_trigmetnomu*SF_LepTightLoose*SF_NLO' --sp DYJetsHT

An example for the 1e selection would be:

$ python mcAnalysis.py monojet/mca-74X-Ve.txt  -P /data1/emanuele/monox/TREES_25ns_1LEPSKIM_23NOV2015 --s2v -j 8 -l 2.215  -G   monojet/wenu_twiki.txt    -F mjvars/t "/data1/emanuele/monox/TREES_25ns_1LEPSKIM_23NOV2015/friends/evVarFriend_{cname}.root"   --FM sf/t "/data1/emanuele/monox/TREES_25ns_1LEPSKIM_23NOV2015/friends/sfFriend_{cname}.root"  -W 'vtxWeight*SF_trig1lep*SF_LepTight*SF_BTag*SF_NLO'  --sp WJetsHT

The summary of the scale factors to be applied in the different selections is the following:

selection	trigger	leptonID	btag	xsec
Z→μμ	`SF_trigmetnomu`	`SF_LepTightLoose`	`SF_BTag`	`SF_NLO`
W→μν	`SF_trigmetnomu`	`SF_LepTight`	`SF_BTag`	`SF_NLO`
Z→ee	`SF_trig1lep`	`SF_LepTightLoose`	`SF_BTag`	`SF_NLO`
W→eν	`SF_trig1lep`	`SF_LepTight`	`SF_BTag`	`SF_NLO`
signal	`SF_trigmentnomu`	1.0	`SF_BTag`	`SF_NLO`

Producing the 'prefit' plots

The scripts at the basis of the plotting is mcPlots.py, which inherits from mcAnalysis.py. There is an helper script, analysis.py which puts together the correct options for each region or need. The script can be run with the option "-r" or "--dry-run", which prints the base command, so one can modify options on the fly, and then run it.

These are the commands used to produce the yields and the 'prefit' plots for all channels:

signal region yields:  vbfdm/analysis.py -r SR --pdir plots/SR/ -d
signal region yields:  vbfdm/analysis.py -r SR -d

The regions are: 'ZM' (Z to muons),'WM' (W to muon),'ZE' (Z to electrons), 'WE' (W to electron), 'SR' (signal region). Similarly to the base scripts mcAnalysis.py and mcPlots.py, one can add some options. Most common ones are:

-U pattern reads the cut list only up to the selected cut, ignoring any one following it
--fullControlRegions loops over all the regions of the fit and makes plots / tables for all of them

Producing the inputs of the fit

The main inputs of the fit are the templates of the variable chosen for fit (TH1 histograms) with alternative shapes obtained by varying weights according to given systematic uncertainties, and transfer factors from a control region to signal region. Both can be produced by running analysis.py. Note that the latter need the former as input.

vbfdm/analysis.py --propSystToVar : make the nominal template and the alternative ones by propagating the systematic uncertainties. These are defined ad varied weights in the files vbfdm/syst_"CR".txt for each CR region (where "CR"=SR, ZM, etc.). The output files are in the dir "templates" if not specified by the option --pdir
vbfdm/analysis.py --tF : creates all the transfer factors, running over the templates previously produced. Note that what systematics to consider in numerator and denominator is dependent on the analysis, and here it is hardcoded for monojet / VBF Hinvisible analysis

To make the same for the 2D case, one has just to add the option:

--twodim: this reads the plot file vbfdm/common_plots_2D.txt instead of vbfdm/common_plots.txt

Producing the datacards for shape analysis

The datacards for the shape analysis are done running on the fly the selection producing the templates for the fit, but needs transfer factors previously produced and stored in a ROOT file. The main script is makeShapeCards.py, which has to be run with different options depending on the region (SR or any of the CRs), and if it is 1D or 2D fit. The script, together with the usual mca,cuts and plots files, also reads another argument with the list of normalisation systematics assigned to each process (eg. "vbfdm/systsEnv.txt"). The main options are:

--region : specifies the region, which will be treated by combine as a dedicated "channel". "SR" will be the one containing the signal process(es), i.e. the ones on which measure "mu"
--processesFromCR <process1,process2,...> specify the list of processes, separated by comma (eg. ZNuNu,W) that have to be constrained by control regions through transfer factors. This is only for SR
--correlateProcessCR 'process_to_be_constrained_in_SR,name_of_SR,name_of_the_histo_of_TF,name_of_ROOT_file_where_TF': connect a given process to the "signal" process in this CR. This has to be accompanied by the exclusion of the processes that make the signal in this CR, because the difference of data - backgrounds_in_CR will make the signal. Eg., when running on ZM, one has to add "--xp ZLL,EWKZLL"

The helper script vbfdm/make_cards.sh helps in making the datacards for signal and 4 CRs. The variable to be used in shape analysis and the binning is hardcoded in the script. The typical way of running the script is by giving these sorted arguments:

output_dir: where the datacards and ROOT files with combine inputs are
luminosity: the luminosity, in 1/fb, to which normalise the MC yields
sel_step: the selection step up to use. The subsequent cuts will be ingored. Note that this has to be consistent with the one used to produce the TFs
region: can be a signal region, or "all" to make SR and 4CRs in series.

Note that the correct variable has to be uncommented in the .sh file. To make the 2D fit datacards, add the additional argument at the end:

twodim: this adds the rebinning function to use the final bin for 2D->1D unrolling

Eg: vbfdm/make_cards.sh cards 24.7 vbfjets all.

Then one has to combine the datacards for the SR and the control regions in one. One can use combineCards.py treating each region as a "channel" (but note that since the regions are correlated, one needs at least the SR, one Z and one W CR). The combination command is:

combineCards.py SR=vbfdm.card.txt ZM=zmumu.card.txt ZE=zee.card.txt WM=wmunu.card.txt WE=wenu.card.txt > comb.card.txt

Producing the 'postfit' plots

Producing pre/post-fit plots from the output of combine is a good diagnostic tool that the fit is working correctly (eg. looking at the post-fit plots in the control regions). To do that one has first to run combine with MaximumLikelihood option, and then running a script to make the plots with the mcPlots style. To do that:

combine -M MaxLikelihoodFit --saveNormalizations --saveShapes --saveWithUncertainties comb.card.txt: runs the ML fit, and saves the yields after the fit, and saves the output of pre-fit and post-fit shapes, both for the B-only (signal constrained to be 0) and S+B fit. The output is in the mlfit.root file.

The file mlfit.root contains everything, but since the inputs have been converted to RooDataHist all the TH1Fs have the x-axis that corresponds to the observable and the bin content will be the PDF density / events divided by the bin width. The script postFitPlots.py helps making the conversion, apply the std plot style convention, take the data distribution, make ratio data/prediction plot. The data distribution is taken by the output of mcPlots on the desired variable, so one has to have run mcPlots.py, at least on that variable. The way to run it is:

python postFitPlots.py mcafile.txt plots.root varname mlfit.root region_name

Eg, for ZM region and "mjj_fullsel" variable: python postFitPlots.py vbfdm/mca-80X-muonCR.txt plots/ZMCR/vbfjets/plots.root mjj_fullsel mlfit.root ZM. The file "plots.root" is the one produced by mcPlots.py script.

Producing the limit plots

-- EmanueleDiMarco - 2015-04-28

Topic revision: r47 - 2018-10-18 - EmanueleDiMarco

Main

Webs

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
Main All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback