13 TeV Second Gen LQ Pair Production Analysis Documentation
History
- 2016 data
- A paper was published in Phys. Rev. D, which is available on the PhysicsResultsEXO17003 twiki page.
- Useful links:
- 2015 data
- A public PAS was produced for Moriond 2016, which is available on the PhysicsResultsEXO16007 twiki page.
- Useful links:
- 2012 data
- A paper was published in Phys. Rev. D with both LQ1 and LQ2 analyses, which is available on the PhysicsResultsEXO12041 twiki page.
- Useful links:
To-Do
Part I: Production of NTuples
Overview
Status
- All LQ Scalar signal, background, and single-mu data is processed.
Where to find NTuples.
Part II: Analysis of NTuples
All of this code is stored in a single github repository, which can be checked out as follows:
git clone git@github.com:DavidMorse/LQ2Analysis.git LQ2Analysis13TeV
cd LQ2Analysis13TeV
cmsrel CMSSW_10_2_18
cd CMSSW_10_2_18/src
cmsenv
cd -
Applying Rochester corrections to muon pT and estimating muon momentum scale systematics with the Rochester and Generalized Endpoint methods requires two additional packages:
git clone ssh://git@gitlab.cern.ch:7999/akhukhun/roccor.git RoccoR
git clone ssh://git@gitlab.cern.ch:7999/cms-muonPOG/GeneralizedEndpoint/GEScaleSyst.git GEScaleSyst
Some of these instructions are re-iterated from the
github readme file
The goal of Part II is to creation of small flat analysis trees, since the Ntuples on EOS are usually on the order of 10TB, and need to be reduced for running interactive analysis. Jobs are run on the lxplus batch system to analyze the NTuples. The ntuples contain variables such as:
- Vectors of values for kinematics. i.e. "PFJetPt" is a vector of doubles, with one pT value for each jet stored in the event.
- Vectors of values for quality criteria (id, isolation, etc)
- Event-level information, including MET quality flags, vectors of different PDF weights (for MC), vertex information, etc.
The analysis trees aim at being roughly a factor of 1000 smaller than the ntuples (order of 10GB), and to contain only single valued variables (ints and floats)... ostensibly basically a large flat ntuple. (The Tntuple class itself does not allow enough variables). Values in the trees will be things like:
- The pT of the leading muon, and of the subleading muon, and the same for jets. And the eta and phi values.
- Kinematic variables used for optimization, like the di-muon mass, the MT(muon,MET), the ST = (pt_muon1+pt_muon2+pt_jet1+pt_jet2), etc...
- Single event weights which include the pileup reweighting.
- Trigger flags, and event quality flags.
- Basically, anything needed to generate a histogram with the appropriate weights, and interactively changeable cuts.
The following are steps for creating the trees
Getting integrated luminosity and pileup weight
With nanoAOD tools it is no longer necessary to calculate pileup weights ourselves. Directions below are kept for posterity.
A shell script is available at
this link, and has been made for the newest. It will retun root files with the PU histograms, which will be used interactively by the tree making code to get the weights for each event.
It can be run like:
./GetPULumiInfo.tcsh
This is already done in practice. Histograms are available for the
central value and for systematic variations
up and
down
The integrated luminosity is the official value (35.9 +/- 2.5%)/fb
Get organized with a spreadsheet
A simple csv file is used to keep track of everything involved in the tree production. This includes the locations of the ntuple files on EOS, any
JSON file you wish to use on real data, cross-sections, original numbers of events (for normalization) and group names (output files will be merged according to their group).
An example of a spreadsheet (in progress) is at
this link .
The spreadsheet must contain the following columns:
SignalType |
A unique identifier at the beginning of the names of the root files. |
Xsections |
The cross-section to normalize the sample to (NLO is better!) |
N_orig |
The original number of events in the sample (If you don't know this, see next steps for instructions) |
Group |
The group for the files. For instance, you might have three SignalTypes like WW, WZ, and ZZ, and want to put them in a Group called "DiBoson" |
CustomJson |
The name of a Json file specifying the good runs/lumis to use. This can be the same for every data type, or different, or 0 for MC |
EOSDirectory |
The eos path where the files are kept for this signaltype. Should be like a typical EOS path e.g. /store/group/..../ |
This spreadsheet will be a command-line argument of the
analysis batcher (discussed more in steps below).
What if you don't know the original number of events in the ntuples?
Since the NTupleMaker
applies a skim, histograms are stored in the NTuple root files which contain the original number of events. This is needed for correct MC event weights.
The
analysis batcher is built to read the csv file, and use a run a python script on every root file contained in the EOS directories specified in the CSV. Normally this is for making tree files, but it can also just make a txt file containing the number of events from the event count histogram in the NTuples, and we can merge that info to get the totals - and then fix up the CSV.
In practice it is much easier than it sounds
Just run the following:
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleEvCounter.py -t EventCount_2016Full -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 0 -q espresso -s 100 --FileRefresh --merge
Just to clarify on the arguments:
-i CSV File |
The CSV file you wish to run on. |
-t Tag name |
Results will output in a directory specified by this tag-name |
-j JSON file |
Not important here |
-p 0 |
Not important here. |
-q espresso |
The HTCondor queue to use. |
-s 100 |
Number of files to analyzer per batch job (the split number) |
--FileRefresh |
The code will automatically make a list of good ntuple files and store it locally for future use, so it doesn't have to re-read directories all the time. This demands to re-read directories. |
--merge |
Compile the results. For event counting, this lists total aggregated counts. For tree-making, it hadds the output into single root files (SummaryFiles) |
This will produce an event count log (see screen output) and you can update the csv file accordingly.
Furthermore, if you ran the above commmand, and you don't want to wait for it to finish, kill the process, leave, and pickup where you left off. Here, the previous command made the directory "NTupleEvCounter_EventCount_Summer2016_Full_2016_05_12_23_20_07", and this is the command to continue where we left off:
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleEvCounter.py -t EventCount_2016Full -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 0 -q espresso -s 100 -d NTupleEvCounter_EventCount_Summer2016_Full_2016_05_12_23_20_07
Finally, add the "--merge" flag to take all the output files (in this case one txt file per root file) and sum the output
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleEvCounter.py -t EventCount_2016Full -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 0 -q espresso -s 100 -d NTupleEvCounter_EventCount_Summer2016_Full_2016_05_12_23_20_07 --merge
A message on the screen will tell you the output file location. e.g.:
Output is in NTupleInfo2016Full_EventCountLog.csv
How to create the analysis trees.
Similar to what we did above, a different python file will be used to analyze the ntuples (using pyROOT), and make lightweight trees. There will be one file for every NTuple file, but the tree will use "hadd" to merge these files according to the "Group" in the CSV file. At the end of it all, we will have just a few root files like "SingleMuData.root" "ZJets.root" etc - which contain everything we need for each data or MC sample.
These files can take up many GB. I recommend using an AFS work space like "/afs/cern.ch/work/u/username/..." and bumping up your quota to 100GB at the
afs workspaces page
Running is similar to the previous step. This is the command:
Example for LQ:
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016QuickTest -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 0 -q tomorrow -s 50 --merge
Example for HH:
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016FullHH_customNano.csv -py NTupleAnalyzerHH_nanoAOD.py -t Full2016 -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 1 -q tomorrow -s 50 --merge
To detail the arguments:
-i CSV File |
The CSV file you wish to run on. |
-t Tag name |
Results will output in a directory specified by this tag-name |
-j JSON file |
Default JSON file. Over-ridden by column in CSV file. Always put something here. |
-p 0 |
Whether or not to store branches of pdf weights. |
-q tomorrow |
The batch queue to use. |
-s 40 |
Number of files to analyze per batch job (the split number) |
-y 2016 |
Year of data and MC. Choices are 2016, 2017, 2018 |
--FileRefresh |
The code will automatically make a list of good ntuple files and store it locally for future use, so it doesn't have to re-read directories all the time. This demands to re-read directories. |
--merge |
Compile the results. For event counting, this lists total aggregated counts. For tree-making, it hadds the output into single root files (SummaryFiles) |
Other studies might require slight modifications to what you want to run. A couple are built in via the tagname option. These include:
QCDNonIso |
Removed muon isolation so that QCD studies can be performed |
EMuSwitch |
Replaces one muon with a HEEP electron, for ttbar studies in an enriched e-mu control region |
QuickTest |
Turn off systematics (much faster, but does not contain all information needed for final analysis) |
Examples of these runs are as follows. Typically you might not need the full CSV files for these, so you could make new ones with just the datasets you need
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016EMuSwitch -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 1 -q tomorrow -s 40 -y 2016 --merge
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016QCDNonIsoQuickTest -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 1 -q tomorrow -s 40 -y 2016 --merge
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016QuickTest -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 1 -q tomorrow -s 40 -y 2016 --merge
What if it takes too long?
It is also important to note that these jobs can take several hours or a day, depending on what you run. The
AnalyzerMakerFastLocalHTCondor.py script stays on and waits for the batch jobs to finish. If a file is missing, it relaunches it. This is all transparent to the user, but it can stay on the screen for quite a while.
There are two tools to help out with this.
- The first (and best) way is to to leave and come back later. When you see that the AnalyzerMakerFastLocalHTCondor script has submitted its jobs and is waiting for them to be done, you can just ctrl+c and come back later. The TagName option has made a working directory with a time-and-date stamp, and you can specify to pick up where you left off. So go home, come back, and re-issue the the command specifying the working directory with the "-d" command, like:
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016 -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 1 -q tomorrow -s 40 -y 2016 --merge -d NTupleAnalyzer_Full2016_05_22_23_36_59
- Another helpful tool is "screen" (less preferred than leaving and coming back later). If you don't want to wait around for jobs to finish, use a "screen" on lxplus. Note the lxplus machine you are on (e.g. lxplus730), and open a screen by typing "screen". Launch the jobs with AnalyzerMakerFastLocalHTCondor.py as described above. Type ctrl+A+D to exit. You can then log out, go home, etc, and log back into to lxplus430, and type "screen -r" to return to your screen.
Part III: Running the result producer
From the flat trees, analysis and results are performed. This includes many steps, almost all of which are done in a
single result producer python script . In this script, you will find the following:
- The definition of the event weight, according to the weight variable in the trees, HLT reweighting MC with a single-mu trigger, and the integrated luminosity.
- Definition of a preselection - a selection which demands the LQ decay product particles, but doesn't make strong cuts. Normalization of the Z, W, and TTbar backgrounds will be done at preselection.
- Functions that make histograms at preselection and final selection.
- Functions that can optimize signal-background separation with cuts based on S/sqrt(S+B) and store the result to a txt file.
- Functions that can make the event counts tables based on the above optimizations.
- Functions that can make cards compatible with the Higgs limit tools, to get the real "final" results.
- Unused functions which can perform analysis with TMVA (interesting, but not needed for our normal results). [deprecated]
Basically, everything you need to run is in the main() function of the script. This is just a few background studies, plot production, and table production (tables get parse to higgs limit tool cards). This is a brief overview of the commonly used parts of the code.
To run the code, execute:
python LQResultProducer.py -y [year] -b [b-tag category]
where the year can be 2016, 2017, 2018, or combined, and the b-tag category can be 0, 1, or 2 (1 for EXO-21-019).
Setup and directory
At the beginning of the main function, you will see the following lines, followed by the definition of the variable binnings for different histograms. The results in this example (running with the option -y 2016) would be output to a directory called "Results_Unblinded_2016". It will attempt to make this directory. If it already exists, that is fine. The *OptCutFile files that detail the final selection cuts must be present in a subdirectory (e.g., Results_Unblinded_2016/Final_selection).
def main():
#######################################################################################
###### The output directories, and the files that define final selection cuts #######
#######################################################################################
# Please retain the "scriptflag" comment. Some python scripts are available which search
# for this, and make use of it. e.g. For systematic variations, we can in batch instead
# of running serially, which speeds things up.
version_name = year+"_Unblinded" # scriptflag
#version_name = 'Testing_noQCD_14nov' # use sf tag above if this is the real folder
os.system('mkdir Results_'+version_name)
global MuMuOptCutDir
MuMuOptCutDir = 'Final_selection' # scriptflag
#os.system('mkdir Results_'+version_name+'/'+MuMuOptCutDir) # scriptflag
MuMuOptCutFile = 'Results_'+version_name+'/'+MuMuOptCutDir+'/Opt_LQuujj_Cuts.txt' # scriptflag
#MuMuOptCutFile = 'Results_'+version_name+'/OptLQ_uujjCuts_Smoothed_pol2cutoff.txt'
MuNuOptCutFile = 'Results_'+version_name+'/OptLQ_uvjjCuts_Smoothed_pol2cutoff.txt' # scriptflag
PDF Uncertainties
The PDF uncertainties are time consuming. By enabling the following lines (turning False to True), you will run the PDF studies. This will print out some lists of uncertainties on the screen. I recommend not to do this normally. You can paste the output of this routine into the "HARD CODED RESULTS USED IN ANALYSIS" section.
# ====================================================================================================================================================== #
# These are PDF uncertainty studies. Ignore these for now! ### STILL FIXING THIS PORTION
# ====================================================================================================================================================== #
if (False):
PDF4LHCUncStudy(MuMuOptCutFile,MuNuOptCutFile,version_name)
PDF4LHCPlotsFromResultDict('Results_'+version_name+'/PDFVariationsDictionary.json',version_name)
QCD Studies
QCD is negligible, and we just conduct a few studies to validate that. These studies make plots and produce a few numbers to cite in the analysis note. You can enable the QCD studies by turning on this part of the code:
# ====================================================================================================================================================== #
# This is the QCD study. It will make a couple plots, and test QCD contamination at final selection. We consider QCD negligible, but is good to test this once!
# ====================================================================================================================================================== #
if (False):
qcdselectionmumu = '((Pt_muon1>45)*(Pt_muon2>45)*(Pt_jet1>125)*(Pt_jet2>45)*(St_uujj>300)*(DR_muon1muon2>0.3))'
qcdselectionmunu = '((Pt_muon1>45)*(Pt_muon2<45.0)*(Pt_jet1>125)*(Pt_jet2>45)*(Pt_ele1<45.0)*(St_uvjj>300)*(DPhi_muon1met>0.8)*(DPhi_jet1met>0.5))'
QCDStudy(qcdselectionmumu,qcdselectionmunu,MuMuOptCutFile,MuNuOptCutFile,NormalWeightMuMu,NormalWeightMuNu,version_name)
TTbar E-Mu data-driven studies
The data-driven TTbar estimate relies on a calculation of the mu-mu to e-mu efficiency, and we also provide some characteristic plots and numbers in the analysis note. To create this material, you can turn on the following part of the main function. Once again, there is no need to run this all the time. When the study is stable, you just need to store the reweighting factor in the "HARD CODED RESULTS" section of the script - e.g. " emu_id_eff = 0.5716 "
# ====================================================================================================================================================== #
# The ttbar e-mu data-driven study. You only need to do this once, check the validation output, and use the scale-factor as global variable "emu_id_eff"
# The scale-factor should be near 0.5
# ====================================================================================================================================================== #
if (False):
# TTBar STudy
[Rtt_uujj,Rtt_uujj_err] = GetEMuScaleFactors( NormalWeightEMu+'*'+preselectionemu, EMuDirectory)
Rw_uvjj,Rz_uujj = [1.0,1.0]
# # PreSelection Plots
MakeBasicPlotEMu("St_uujj","S_{T}^{e #mu j j} [GeV]",stbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("Pt_miss","E_{T}^{miss} [GeV]",metbinning2,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("M_uu","M^{e #mu} [GeV]",bosonbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("M_uujj2","M^{e/#mu j}_{2} [GeV]",lqbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("DR_muon1muon2","#DeltaR(#mu,e})",drbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("St_uujj","S_{T}^{e #mu j j} [GeV]",stbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("Pt_miss","E_{T}^{miss} [GeV]",metbinning2,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("M_uu","M^{e #mu} [GeV]",bosonbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("M_uujj2","M^{e/#mu j}_{2} [GeV]",lqbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("DR_muon1muon2","#DeltaR(#mu,e})",drbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
Making analysis-note style plots
The analysis-note style plots have a ratio subplot, and no CMS stamp. Each plot is a single function-call of MakeBasicPlot(args). There is also an option to make a basic results table. You can turn all this on in the following part of the code:
# ====================================================================================================================================================== #
# This is a basic plotting routine to make Analysis-Note style plots with ratio plots.
# ====================================================================================================================================================== #
if (False):
# Get Scale Factors
[[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
[[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')
# Optionally, you can make an event-count table for each selection. Useful if testing a new optimiation
# We will do this later wtih full systematics for our set of stable cuts.
if False:
QuickTableTTDD(MuMuOptCutFile, preselectionmumu+"*(M_uu>100)",NormalWeightMuMu,Rz_uujj, Rw_uvjj,Rtt_uujj,0)
QuickTable(MuNuOptCutFile, preselectionmunu,NormalWeightMuNu,Rz_uujj, Rw_uvjj,Rtt_uvjj,0)
# Here are a few plots which are zoomed-in on control regions.
MakeBasicPlot("M_uu","M^{#mu #mu} [GeV]",bosonzoombinning_uujj_Z,preselectionmumu,NormalWeightMuMu,NormalDirectory,'controlzoom_ZRegion','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
...etc...
# UUJJ plots at preselection, Note that putting 'TTBarDataDriven' in the name turns on the use of data-driven ttbar e-mu sample in place of MC
MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standard_TTBarDataDriven','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
... etc ...
Running the PAS-style plots
PAS-style plots have no ratio subplots, and have a CMS stamp. You can enable this portion of the main function to run the PAS-style plots. There is also a loop over LQ mass values that you can easily modify for different final-selection plots. Using the "tagfree" file tag will remove the CMS stamp.
# ====================================================================================================================================================== #
# This is a plotting routine for PAS-style publication-quality plots
# ====================================================================================================================================================== #
if False:
# Get Scale Factors
[[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
[[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')
...etc...
# The two flags are for regular plots, and tagfree plots (plots that don't say CMS Preliminary - for notes or thesis)
for flag in ['','tagfree']:
# Preselection plots in the UUJJ channel in the PAS style (no subplot)
MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
... etc ...
Running the full event yields (with all systematic variations) and making higgs limit-tool cards
There are two ways to do this. It is easy to just let this run, but in practice it has been time consuming (many systematic variations) and can take more than a day. The end-goal of this all is that it produces several txt files in the result directories containing the event yields, and then combines all that information into higgs limit tool cards.
The long way to do this is to just let this part of the code run:
# ====================================================================================================================================================== #
# This runs a "FullAnalysis" - i.e. produces tables with full systematis included.
# ====================================================================================================================================================== #
# You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
# Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
# When done, proceed to the next step to make higgs limit cards
if (True):
FullAnalysis(MuMuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuMu,'TTBarDataDriven') # scriptflag
FullAnalysis(MuNuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuNu,'normal') # scriptflag
if (True):
uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')
uujjcards = ParseFinalCards(uujjcardfiles)
uvjjcards = ParseFinalCards(uvjjcardfiles)
finalcards = FixFinalCards([uujjcards,uvjjcards])
print 'Final Cards Available in ',finalcards
The short way circumvents the running of the FullAnalysis(...) functions, by making multiple copies of the LQResultProducer - one for each systematic variation - and running them in batch in parallel. This is much faster, but you have to rely on another script to parse it all out. When the script finishes, you just have to run the second part of the above (with the ParseFinalCards and FixFinalCards commands). To use this method, do the following, and wait for the batch-jobs to finish.
(1) Run
python SysBatcher.py --launch
(2) Wait for those batch jobs to finish, and then use just the second part of the above code
if (True):
uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')
uujjcards = ParseFinalCards(uujjcardfiles)
uvjjcards = ParseFinalCards(uvjjcardfiles)
finalcards = FixFinalCards([uujjcards,uvjjcards])
print 'Final Cards Available in ',finalcards
What to do if systematics need to be evaluated at a different selection level
If systematic uncertainties are evaluated at a different selection than the final analysis, say the enhanced selection level, then you can just run the above twice, with a few minor changes:
(1) You will need a new cut file (Opt_LQuujj_Cuts.txt) with the different selection you want, and it should be kept under a different subfolder in the results directory with the name Enhanced_selection. So make this new folder:
mkdir [Results_directory_name]/Enhanced_selection
and make your cut file Opt_LQuujj_Cuts.txt and place it in the new subdirectory.
(2) Then modify LQResultProducer.py to point to the new cut file, so
MuMuOptCutDir = 'Final_selection' # scriptflag
needs to be changed to, say:
MuMuOptCutDir = 'Enhanced_selection' # scriptflag
(3) Then just run the SysBatcher.py again to get a new datacard with the new selection (it will be stored in the new folder).
(4) Change the names of the datacards to FinalCardsFinalSys_[year].txt and EnhancedCardsLQ_[year].txt, to distinguish them in the next step.
(5) Last, you will want to combine the datacards to take the systematics from EnhancedCardsLQ_[year].txt, and the events/stats from FinalCardsFinalSys_[year].txt. Simply run:
python CombineDatacards.py -d [Results_directory_name]
And a new datacard containing the enhanced selection systematics and the final selection event yields and stats will be created:
[Results_directory_name]/Final_selection/FinalCardsLQ_[year].txt
Displaced Lepton reinterpretation
sed 's/LQ/BLCTau1/g' LQResultProducer.py > BLCTau1ResultProducer.py
sed 's/BLCTau1/BLCTau10/g' BLCTau1ResultProducer.py > BLCTau10ResultProducer.py
sed 's/BLCTau1/BLCTau100/g' BLCTau1ResultProducer.py > BLCTau100ResultProducer.py
sed 's/BLCTau1/BLCTau1000/g' BLCTau1ResultProducer.py > BLCTau1000ResultProducer.py
cat SysBatcher.py | sed 's/LQ/BLCTau1/g' | sed "s/,'uvjj'//g" > BLCTau1SysBatcher.py
cat BLCTau1SysBatcher.py | sed 's/BLCTau1/BLCTau10/g' | sed "s/,'uvjj'//g" > BLCTau10SysBatcher.py
cat BLCTau1SysBatcher.py | sed 's/BLCTau1/BLCTau100/g' | sed "s/,'uvjj'//g" > BLCTau100SysBatcher.py
cat BLCTau1SysBatcher.py | sed 's/BLCTau1/BLCTau1000/g' | sed "s/,'uvjj'//g" > BLCTau1000SysBatcher.py
Part IV: Making limit plots
The limit-setting code in now in a
subdirectory of the main github repository. It has some
standalone instructions, but since the LQResultProducer already produces a compatible FinalCardsLQ.txt, here are the short instructions.
(1) Checkout the higgs limit tool code and appropriate CMSSW version. From:
HiggsAnalysis CombinedLimit github
Setting up the environment (once)
setenv SCRAM_ARCH slc7_amd64_gcc700
cmsrel CMSSW_10_2_13
cd CMSSW_10_2_13/src
cmsenv
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit
Update to a reccomended tag - currently the reccomended tag is v8.1.0
cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit
git fetch origin
git checkout v8.1.0
scramv1 b clean; scramv1 b # always make a clean build, as scram doesn't always see updates to src/LinkDef.h
- when running with the HybridNew calculator, you can safely ignore the following warning from RooFit
WARNING:Eval -- RooStatsUtils::MakeNuisancePdf - no constraints found on nuisance parameters in the input model
(2) Put your FinalCardsLQ.txt file in the same directory as the RunBasicStatsCLs.py script.
(3) Run the single-channel limits (limit results will print to screen - save them!).
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only
(4) Run the combined limit. It will have to batch out some jobs, so it is a two step process:
- Run the script which creates and launches the jobs (one job per beta value)
- When jobs are complete, merge them. Results will print to the screen.
5) Take the results from (3) and (4) and put them into the
plot macros. The plot macros compile and run. An example of this wold be:
{
gROOT->ProcessLine(".L ComboPlotLQ2.C++");
gROOT->ProcessLine("makePlots()");
gROOT->Reset();
gROOT->ProcessLine(".L BR_Sigma_MuMu_vsMass.C++");
gROOT->ProcessLine("makePlotsBO()");
gROOT->Reset();
gROOT->ProcessLine(".L BR_Sigma_MuNu_vsMass.C++");
gROOT->ProcessLine("makePlotsBH()");
gROOT->ProcessLine(".q");
}
Part V: Making publication-quality tables
The event yield and systematics tables can be automatically created based on the FinalCardsLQ.txt file (the higgs limit-card file) which has already been created. It simply needs to be parsed into a LaTeX table.
To make tables of event yields with statistical errors, tables with the systematic uncertainties, and tables with the range of systematics across signal masses, run:
python MakeTablesFromDatacards.py -y [2016, 2017, 2018, or combined] -f your/path/to/FinalCardsLQ.txt -e your/path/to/EnhancedCardsLQ.txt
This script can handle both single-year and combined-year datacards. The datacard passed with '-f' will give the event yields and statistical errors, while the datacard passed with '-e' will give the systematic uncertainties. Inside MakeTablesFromDatacards.py, each table is printed with the following functions:
YieldsTable(): returns a LaTeX table with the event yields, statistical uncertainties, and systematic uncertainties (for the totals) for a single year, or the combination (if combined datacard).
SystematicsByYearTable(): returns a LaTeX table with each contribution to the systematic uncertainties for a single mass point, separately for all three years (if combined datacard) or for a single year.
SysRangesByYearTable(): returns a LaTeX table with the high and low ranges of systematic uncertainties for each contribution, separately for all three years (if combined datacard) or for a single year.
Obsolete
These scripts have been replaced by MakeTablesFromDatacards.py.
To get the content of the event yields table, do :
python CleanTableParse.py your/path/to/FinalCardsLQ.txt
To get tables of systematic uncertainties, do:
python SysTableParse.py your/path/to/FinalCardsLQ.txt
This last command also prints out the total percent uncertainties for each channel/mass. This list is used in the LQResultProducer.py in the "HARD CODED RESULTS" section to define the uncertainty band, used in the PAS-style plots.
Part VI: Making PAS-quality plots.
The output of the table production (Part 4) will give you lists of systematic uncertainties percent values at each LQ mass. Use this in the "HARD CODED RESULTS" part of the LQ result producer to update the lines:
# These are the total background uncertainties. They are used just to make some error bands on plots.
totunc_uujj = [3.46, 2.55, 2.98, 3.67, 5.44, 5.73, 6.21, 6.47, 9.47, 11.16, 14.79, 19.75, 31.11, 35.41, 8.12, 8.12, 8.12, 8.12, 8.12]
totunc_uvjj = [13.37, 12.37, 13.35, 12.97, 13.62, 15.94, 16.23, 20.67, 23.94, 30.29, 35.3, 52.94, 69.57, 46.09, 42.3, 42.3, 42.3, 42.3, 42.3]
Then you can run the standard PAS plots in the LQResultProducer, i.e. enable this part of the main function:
# ====================================================================================================================================================== #
# This is a plotting routine for PAS-style publication-quality plots
# ====================================================================================================================================================== #
if False:
# Get Scale Factors
[[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
[[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')
...
# The two flags are for regular plots, and tagfree plots (plots that don't say CMS Preliminary - for notes or thesis)
for flag in ['','tagfree']:
# Preselection plots in the UUJJ channel in the PAS style (no subplot)
MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlot("Pt_jet2","p_{T}(jet_{2}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
...
Making results for Vector LQ
The goal here will be to port all this machinery to making Vector LQ results with as little effort as possible. To do so, I've made some tweaks to the LQResultProducer, such that many of the output file names will contain the letters "LQ". From now on, "LQ" will indicate the normal (scalar) analysis, and the different vector scenarios will be denoted by "AM", "MC", "MM", and "YM". Correspondingly, I've organized the tree-making facility to name all the new signal files like this. e.g., see
the vector LQ csv file. After making trees with the vector LQ csv file, and putting those new files into the same directory as the scalar files, we can proceed with some modifications to the code to run on vector LQ.
Fixing up the cut cards
In the results directory, there are typically a couple txt files which store the optimized cuts. They are called something like "Opt_uvjjCuts_Smoothed_pol2cutoff.txt", and store text that looks like this:
opt_LQuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_LQuujj350 = ((M_uu>115)*(St_uujj>460)*(M_uujj2>115))
opt_LQuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
.....
In contrast our vector LQ has mass points every 100
GeV, and goes up to 1800. First, let's fix the file names of the cut cards for scalar:
mv Opt_uujjCuts_Smoothed_pol2cutoff.txt OptLQ_uujjCuts_Smoothed_pol2cutoff.txt
mv Opt_uvjjCuts_Smoothed_pol2cutoff.txt OptLQ_uvjjCuts_Smoothed_pol2cutoff.txt
Next, copy the content of these files to new vector YM files, changing the "LQ" to "YM", using only the values every 100
GeV, and extending to 1800 (holding the last cuts fixed. For example:
Contents of OptYM_uujjCuts_Smoothed_pol2cutoff.txt:
opt_YMuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_YMuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
opt_YMuujj500 = ((M_uu>150)*(St_uujj>685)*(M_uujj2>155))
opt_YMuujj600 = ((M_uu>175)*(St_uujj>820)*(M_uujj2>210))
opt_YMuujj700 = ((M_uu>195)*(St_uujj>935)*(M_uujj2>295))
opt_YMuujj800 = ((M_uu>215)*(St_uujj>1040)*(M_uujj2>400))
opt_YMuujj900 = ((M_uu>230)*(St_uujj>1135)*(M_uujj2>535))
opt_YMuujj1000 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1100 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1200 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1300 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1400 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1500 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1600 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1700 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1800 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
Contents of OptYM_uvjjCuts_Smoothed_pol2cutoff.txt:
opt_YMuvjj300 = ((MT_uv>155)*(St_uvjj>455)*(M_uvjj>125))
opt_YMuvjj400 = ((MT_uv>205)*(St_uvjj>625)*(M_uvjj>175))
opt_YMuvjj500 = ((MT_uv>245)*(St_uvjj>800)*(M_uvjj>225))
opt_YMuvjj600 = ((MT_uv>275)*(St_uvjj>980)*(M_uvjj>280))
opt_YMuvjj700 = ((MT_uv>300)*(St_uvjj>1160)*(M_uvjj>330))
opt_YMuvjj800 = ((MT_uv>315)*(St_uvjj>1345)*(M_uvjj>380))
opt_YMuvjj900 = ((MT_uv>320)*(St_uvjj>1530)*(M_uvjj>435))
opt_YMuvjj1000 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1100 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1200 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1300 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1400 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1500 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1600 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1700 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1800 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
These cuts are all the same for all vector LQ samples, so just copy it to appropriately named/structured files for "AM", "MC", and "MM".
sed 's/YM/MC/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptMC_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MC/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptMC_uvjjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/AM/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptAM_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/AM/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptAM_uvjjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MM/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptMM_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MM/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptMM_uvjjCuts_Smoothed_pol2cutoff.txt
Running on the vector samples
The LQResultProducer has been written such that we can just replace the "LQ" with any of the vector names and run. Since we are basically just interested in getting Higgs Limit cards for each of the vector samples, we can do the same with the SysBatcher.
First, clone the LQResultProducer and the SysBatcher for vector purposes:
sed 's/LQ/YM/g' LQResultProducer.py > YMResultProducer.py
sed 's/LQ/YM/g' SysBatcher.py > YMSysBatcher.py
sed 's/LQ/MM/g' LQResultProducer.py > MMResultProducer.py
sed 's/LQ/MM/g' SysBatcher.py > MMSysBatcher.py
sed 's/LQ/MC/g' LQResultProducer.py > MCResultProducer.py
sed 's/LQ/MC/g' SysBatcher.py > MCSysBatcher.py
sed 's/LQ/AM/g' LQResultProducer.py > AMResultProducer.py
sed 's/LQ/AM/g' SysBatcher.py > AMSysBatcher.py
Next, just launch the batchers and wait. I recommend not launching them all at once - wait an hour or two between each. Each of these is opening the root files in an afs workspace - so it is best to stagger it a bit.
python YMSysBatcher.py --launch
python MMSysBatcher.py --launch
python MCSysBatcher.py --launch
python AMSysBatcher.py --launch
Then, you can produce the higgs limit cards (the "FinalCards" files in the same as the scalar samples:
https://twiki.cern.ch/twiki/bin/view/Main/LQPairSecondGen13TeVOutline#Running_the_full_event_yields_wi
You will have to do that once, for each of YMResultProducer.py, MCResultProducer.py, MMResultProducer.py, and AMResultProducer.py. It will yield four card files: FinalCardsYM.txt, FinalCardsMC.txt, FinalCardsMM.txt and FinalCardsAM.txt.
Getting vector LQ limits
First, we have to fix one small problem. Because the Higgs limit tools presents the limit as a ratio of the assumed cross section (r), and reports it to 5 digits, the very large size of the LQ signal means we have to increase the r precision. I find that the building works best on lxplus5, but running the combine tool works fine on lxplus6 thereafter.
cd CMSSW_6_1_1/src
sed -i 's#6.4#9.9#g' ./HiggsAnalysis/CombinedLimit/src/Asymptotic.cc
scram b -j 16
New flags have been built into the limit tool to deal with the vector scenario. These flags switch on the use of the vector LQ card file (e.g. FinalCardsYM.txt), and they can be used as follows:
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorYM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorAM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorMM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorMC
Making results for RPV Susy In Progress
In this section we will follow the same methods as the vector LQ samples to get limits on the RPV susy (232) samples. These samples have mass values ranging from 200 to 1000 in 50GeV increments. The Scalar LQ samples were denoted "LQ" and the Vector samples as "YM", "MC", "MM", and "AM". For Susy we will follow a similar convention and label everything "RV"
Creating the cut card
In this example, I will use the same cuts as the regular LQ signal. Dave is further testing the optimal cuts and can modify this
Cuts should use m_stop-100 GeV LQ optimization point.
Contents of OptRV_uujjCuts_Smoothed_pol2cutoff.txt:
opt_RVuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_RVuujj350 = ((M_uu>115)*(St_uujj>460)*(M_uujj2>115))
opt_RVuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
opt_RVuujj450 = ((M_uu>140)*(St_uujj>615)*(M_uujj2>135))
opt_RVuujj500 = ((M_uu>150)*(St_uujj>685)*(M_uujj2>155))
opt_RVuujj550 = ((M_uu>165)*(St_uujj>755)*(M_uujj2>180))
opt_RVuujj600 = ((M_uu>175)*(St_uujj>820)*(M_uujj2>210))
opt_RVuujj650 = ((M_uu>185)*(St_uujj>880)*(M_uujj2>250))
opt_RVuujj700 = ((M_uu>195)*(St_uujj>935)*(M_uujj2>295))
opt_RVuujj750 = ((M_uu>205)*(St_uujj>990)*(M_uujj2>345))
opt_RVuujj800 = ((M_uu>215)*(St_uujj>1040)*(M_uujj2>400))
opt_RVuujj850 = ((M_uu>220)*(St_uujj>1090)*(M_uujj2>465))
opt_RVuujj900 = ((M_uu>230)*(St_uujj>1135)*(M_uujj2>535))
opt_RVuujj950 = ((M_uu>235)*(St_uujj>1175)*(M_uujj2>610))
opt_RVuujj1000 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
Run the event yields and produce higgs limit cards
Just as with the vector LQ, we can clone the LQResultProducer and the SysBatcher for the Susy scenario. Since the RPV susy is only in the mu-mu-j-j channel, we make one slight tweak to remove the uvjj running
sed 's/LQ/RV/g' LQResultProducer.py > RVResultProducer.py
cat SysBatcher.py | sed 's/LQ/RV/g' | sed "s/,'uvjj'//g" > RVSysBatcher.py
Next, just launch the batcher and wait for the jobs to finish.
python RVSysBatcher.py --launch
Then, you can produce the higgs limit cards (the "FinalCards" files in the same as the scalar samples:
https://twiki.cern.ch/twiki/bin/view/Main/LQPairSecondGen13TeVOutline#Running_the_full_event_yields_wi )
To do so, modify the "FullAnalysis" portion of the RVResultProducer.py as such:
# ====================================================================================================================================================== #
# This runs a "FullAnalysis" - i.e. produces tables with full systematis included.
# ====================================================================================================================================================== #
# You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
# Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
# When done, proceed to the next step to make higgs limit cards
if (True):
FullAnalysis(MuMuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuMu,'TTBarDataDriven') # scriptflag
if (True):
uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
uujjcards = ParseFinalCards(uujjcardfiles)
finalcards = FixFinalCards([uujjcards])
print 'Final Cards Available in ',finalcards
And then just run it, e.g.
python RVResultProducer.py
You should now have a "FinalCardsRV.txt" file in your results directory.
Getting RPV Susy limits
Proceed exactly as in scalar and vector LQ case, using the flag "--susyRV" during limit setting. To be precise, if you've never set up the Higgs limit tool in the LimitSetting directory before, just put the FinalCardsRV.txt in the LimitSetting directory, and do this:
cd LimitSetting
export SCRAM_ARCH=slc6_amd64_gcc491
cmsrel CMSSW_7_4_7
cd CMSSW_7_4_7/src
cmsenv
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit
Update to a reccomended tag - currently the reccomended tag is v6.2.1
cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit
git fetch origin
git checkout v6.2.1
scramv1 b clean; scramv1 b # always make a clean build, as scram doesn't always see updates to src/LinkDef.h
sed -i 's#6.4#9.9#g' ./HiggsAnalysis/CombinedLimit/src/Asymptotic.cc
scram b -j 16
cd ../../
python RunStatsBasicCLs.py --do_BetaOne -c LimitTest --Asymptotic_Only --susyRV
Getting RPV Limit plots
A plot macro has been drafted at
LimitSetting/PlotMacros/BR_Sigma_MuMu_vsMassSUSYRPV.C
To run it, simply do:
root -b RunSusyRPVPlotsInRoot
An example output is at
this link
diHiggs
This section contains HH->bbZZ->bblljj specific instructions. The general description mentioned in
Part III: Running the result producer are still applied. So, it is useful to read through that description.
Setup and directory
Using the same procedure in
Part II. If you did this in Part II, the
HHResultProducer.py should be ready.
git clone git@github.com:DavidMorse/LQ2Analysis.git LQ2Analysis13TeV
cd LQ2Analysis13TeV
cmsrel CMSSW_10_2_12
cd CMSSW_10_2_12/src
cmsenv
cd -
Also make sure the files "OptCutFile*" are ready. See details in
Setup and directory.
The workable version of these files are in this
directory.
Result Producer
HHResultProducer.py is now in the github repository. The directories have been updated to automatically get the trees from the EOS directories described below. To run simply do (see details in the next sections):
python HHResultProducer.py
It takes a while to run (particularly when the files need to be accessed on EOS), don't be alarmed.
Procedures to run the Result Producer
Relevant Switches
The follwing are main switches allowing you to choose the channel, signal hypotheses, etc. The comments in the code are self-explaned.
analysisChannel = 'muon'
analysisSignal = 'HHres'
recalRttRz = True
doindiv = True
use_bdtSp0 = False
Getting fbd Rt Rz (QCD multijet, ttbar, and ZJets estimations)
This is done by enabling the following lines (turning False to True):
if False:
[[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err],[Fbd_uujj,Fbd_uujj_err]] = GetNormalizationScaleFactorsAndFbd(NormalWeight+'*'+preselection, NormalDirectory, dyControlRegion, ttControlRegion,0)
exit()
Then hard-code the outputs at these lines
# QCD data-driven scale factor
useDataDrivenQCD = True
fbd_Muon = [1.406, 0.011] # Apr 05, 2019
fbd_Electron = [1.079, 0.009] # Apr 05, 2019
# tt, z SF calculated using Data with NEW method (Apr 05, 2019)
Rz_data_muon = [1.139, 0.006]
Rtt_data_muon = [0.906, 0.013]
Rz_data_electron = [1.241, 0.008]
Rtt_data_electron = [0.968, 0.018]
Getting data-MC comparison plots (analysis-note style plots)
The general idea is the same as in
Making analysis-note style plots. Here you have to enable these lines:
# ====================================================================================================================================================== #
# This is a basic plotting routine to make Analysis Note style plots with ratio plots. AN Analysis-Note
# ====================================================================================================================================================== #
if True :
...
...
MakeBasicPlot("cosTheta_hbb_"+ell+ell,"cos(#Theta) (H->bb) "+latexEll+latexEll,costhetastarbinning,preselection,NormalWeight,NormalDirectory,'standard',ell+ell+'jj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,300)
...
There are two selections used in HH analysis, "preselection" and "final selection". The distributions with final selection are generated by the code starting from the lines
#if analysisChannel=='electron' : exit()
# fixme turning off final selection
# Full Selection Plots
Running the full event yields (with all systematic variations) for HH
The procedure is the same as in
LQ Running the full event yields.... I copy it here with some modifications.
For HH annalysis, we will use the BDT distribution for extracting limits. So, we need the BDT distribution and the systematic variations of it. The end-goal of this all is that it produces (i) several txt files in the result directories containing the event yields (ii) several root files each contains one systematic variation, and then combines all that information into higgs limit tool cards.
The idea is to circumvent the running of the FullAnalysis(...) functions, by making multiple copies of the LQResultProducer - one for each systematic variation - and running them in batch in parallel. This is much faster, but you have to rely on another script to parse it all out. When the script finishes, you just have to run the final part of the code (with the ParseFinalCards and FixFinalCards commands). To use this method, do the following:
(1) comment out the second and third part
(Set the first part to True and leave the 2nd and 3rd part as Fasle)
# ====================================================================================================================================================== #
# This runs a "FullAnalysis" - i.e. produces tables with full systematis included.
# ====================================================================================================================================================== #
# You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
# Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
# When done, proceed to the next step to make higgs limit cards
if True :
#FullAnalysis(MuonOptCutFile, preselection,preselectionmunu,NormalDirectory,NormalWeight,'TTBarDataDriven') # scriptflag
#FullAnalysis(MuNuOptCutFile, preselection,preselectionmunu,NormalDirectory,NormalWeightMuNu,'normal') # scriptflag
finalSelectionMuon = preselection.replace(bTagPresel,bTagFinalsel)
finalWeightMuon = NormalWeight.replace(bTagPreselSF,bTagFinalSF)
#finalSelectionMuon = preselection.replace(bTagsel1loose,bTagsel2loose)
#finalWeightMuon = NormalWeight.replace(bTag1SF,bTag2SF)
#FullAnalysis(MuonOptCutFile, preselection,preselection,preselectionmunu,NormalDirectory,NormalWeight,NormalWeight'normal') # scriptflag #preselection (1 loose bTag)
FullAnalysis(MuonOptCutFile, preselection,finalSelectionMuon,preselectionmunu,NormalDirectory,NormalWeight,finalWeightMuon,'normal') # scriptflag #final selection
if False :
print ' doing RunPDFUncertainty test '
finalSelectionMuon = preselection.replace(bTagPresel,bTagFinalsel)
finalWeightMuon = NormalWeight.replace(bTagPreselSF,bTagFinalSF)
#FullAnalysis(MuonOptCutFile, preselection,preselection,preselectionmunu,NormalDirectory,NormalWeight,NormalWeight'normal') # scriptflag #preselection (1 loose bTag)
RunPDFUncertainty(MuonOptCutFile, preselection,finalSelectionMuon,preselectionmunu,NormalDirectory,NormalWeight,finalWeightMuon) # scriptflag #final selection
if False :
uujjcardfiles = MuonOptCutFile.replace('.txt','_systable*.txt')
#uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')
uujjcards = ParseFinalCards(uujjcardfiles)
#uvjjcards = ParseFinalCards(uvjjcardfiles)#fixme using uujj for now for speed
#finalcards = FixFinalCards([uujjcards,uvjjcards])
finalcards = FixFinalCards([uujjcards])
print 'Final Cards Available in',finalcards
Run SysBatcherHH.py
python SysBatcherHH.py --launch
(2) turn off the first part and turn on the second part
if True :
print ' doing RunPDFUncertainty test '
finalSelectionMuon = preselection.replace(bTagPresel,bTagFinalsel)
finalWeightMuon = NormalWeight.replace(bTagPreselSF,bTagFinalSF)
#FullAnalysis(MuonOptCutFile, preselection,preselection,preselectionmunu,NormalDirectory,NormalWeight,NormalWeight'normal') # scriptflag #preselection (1 loose bTag)
RunPDFUncertainty(MuonOptCutFile, preselection,finalSelectionMuon,preselectionmunu,NormalDirectory,NormalWeight,finalWeightMuon) # scriptflag #final selection
Then run another script
python SysBatcherHHPDF.py --launch
(3) Wait for those batch jobs to finish. Use the script combine_SystHist.py to check the results and resubmit jobs if needed. YYY_MM_DD_TTTTTT are the directories created by (1) and (2).
## # check files
python combine_SystHist.py --checkfiles -t YYY_MM_DD_TTTTTT
python combine_SystHist.py --checkfiles -t YYY_MM_DD_TTTTTT --launch
## # check files pdfs
python combine_SystHist.py --checkfiles --checkpdffiles -t YYY_MM_DD_TTTTTT
python combine_SystHist.py --checkfiles --checkpdffiles -t YYY_MM_DD_TTTTTT --launch
After all jobs is done. Run the following command to combine root files:
## combine qcdScales root and txt files and calculate max/min (up/down)
python combine_SystHist.py --QCDScale
# Combine PDF root and txt files and calculate max/min (up/down)
python combine_SystHist.py --PDF
# Combine all root files
python combine_SystHist.py --Standard
## organize root files and txt files
python combine_SystHist.py --organizefiles
python combine_SystHist.py --organizefiles --launch
(4) Then use just the third part of the above code (HHResultProducer.py) to combine txt files. Turn off the first and the second part, and turn the third part on.
if True :
uujjcardfiles = MuonOptCutFile.replace('.txt','_systable*.txt')
#uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')
uujjcards = ParseFinalCards(uujjcardfiles)
#uvjjcards = ParseFinalCards(uvjjcardfiles)#fixme using uujj for now for speed
#finalcards = FixFinalCards([uujjcards,uvjjcards])
finalcards = FixFinalCards([uujjcards])
print 'Final Cards Available in',finalcards
Then run once for muon and again for electron channel:
python HHResultProducer.py
(5) You will see the final txt file "Results_Testing_diHiggs.../FinalCardsLQuu.txt" and the directory where the root files are stored, "rootfiles_final".
These outputs will be used in setting limits.
Limit calculation and Limit plots
Limits for bblljj channel
(1) The root files in rootfiles_final have distributions with very fine binning. We have to rebin the distribution before limit setting. The binning should be chosen to optimize the sensitivity while maintining the stability of the fit.
cd LimitSetting/RebinHH
# Rebin uu
python rebinHistHH.py --resType HHres --channel uu
python refixHistHH.py --resType HHres --channel uu
python plotSystHistHH.py --resType HHres --channel uu
# Rebin ee
python rebinHistHH.py --resType HHres --channel ee
python refixHistHH.py --resType HHres --channel ee
python plotSystHistHH.py --resType HHres --channel ee
(2) Checkout the higgs limit tool code following instruction From:
SWGuideHiggsAnalysisCombinedLimit, the same as what described in
Part IV: Making limit plots
After running the following command, take note of the results output on the screen.
cd LimitSetting
# Run to get Limits
python RunStatsBasicCLsHH.py --do_BetaOne --HHresuujj -c LimitShapeHH --Asymptotic_Only --noAbsAcc --NpIgnore
python RunStatsBasicCLsHH.py --do_BetaOne --HHreseejj -c LimitShapeHH --Asymptotic_Only --noAbsAcc --NpIgnore
python RunStatsBasicCLsHH.py --doMakeCombineCards --HHresCombMuEle -c LimitShapeHH
python RunStatsBasicCLsHH.py --do_BetaOne --HHresCombMuEle -c LimitShapeHH --Asymptotic_Only --noAbsAcc --NpIgnore
(3) Use the values output on the screen to modify the code MakeLimitsPlots.py. Then run the script to get limit plots.
# plot observed limits to HH
cd PlotMacros
python MakeLimitsPlotsHH.py --channel muon --resType HHres --obslimit 1
python MakeLimitsPlotsHH.py --channel ele --resType HHres --obslimit 1
python MakeLimitsPlotsHH.py --channel CombMuEle --resType HHres --obslimit 1
Scripts for running Combine Tool
I have written a script (runCheckLimits.py) which help guide through typical commands in Combine tool. The input of the script is a directory with the following structure
|---/CLSLimits/
|--- [CombMuEledatacard].txt
|---/LimitShapeHHuu/
|--- [datacard].txt
|--- [distribution].root
|---/LimitShapeHHee/
|--- [datacard].txt
|--- [distribution].root
Example of cammands
## Remove NP
python runCheckLimits.py -ch uu --removeNP -v 1 --launch
## txt to workspace
python runCheckLimits.py -ch uu --text2workspace -v 1 --launch
## Fit and record range
python runCheckLimits.py -ch uu --FitDiagnos -v 1 --launch
## Limits
python runCheckLimits.py -ch uu --limits -v 1 --launch
## Run obs GOF and record the values</span>
python runCheckLimits.py -ch uu --runGOF -v 1 --data --toy --pval --launch
## Run to get impacts
python runCheckLimits.py -ch uu --runImpactIni -v 1 --launch
python runCheckLimits.py -ch uu --runImpactDoFit -v 1 --launch
python runCheckLimits.py -ch uu --runImpactPDF -v 1 --launch
To run spin2 case, add --Spin2 to the command, like:
python runCheckLimits.py -ch uu --text2workspace -v 1 --launch --Spin2
The options for -ch can be uu, ee, or
CombMuEle
Trees
The relevant trees are on the cmsneu machine:
/media/dataPlus/dmorse/hhNtuples/NTupleAnalyzerHH_Full2016HH_v236_2017_04_03/SummaryFiles/
/media/dataPlus/dmorse/hhNtuples/NTupleAnalyzerHH_Full2016HH_v236_QCDNonIsoQuickTest_2017_05_15/SummaryFiles/
Gridpack generation
Follow directions at
QuickGuideMadGraph5aMCatNLO
Adding diboson resonance model
After downloading the genproductions package, you will need to edit the genproductions/bin/MadGraph5_aMCatNLO/gridpack_generation.sh file to compile the diHiggs resonance model in the following way:
After
# where to find the madgraph tarred distribution
Add
## Models for searches of diboson resonances
VVMODEL=dibosonResonanceModel.tar.gz
VVSOURCE=https://cms-project-generators.web.cern.ch/cms-project-generators/$VVMODEL
After
#load extra models if needed
if [ -e $CARDSDIR/${name}_extramodels.dat ]; then
echo "Loading extra models specified in $CARDSDIR/${name}_extramodels.dat"
#strip comments
sed 's:#.*$::g' $CARDSDIR/${name}_extramodels.dat | while read model
do
#get needed BSM model
if [[ $model = *[!\ ]* ]]; then
echo "Loading extra model $model"
wget --no-check-certificate https://cms-project-generators.web.cern.ch/cms-project-generators/$model
cd models
if [[ $model == *".zip"* ]]; then
unzip ../$model
elif [[ $model == *".tgz"* ]]; then
tar zxvf ../$model
elif [[ $model == *".tar"* ]]; then
tar xavf ../$model
else
echo "A BSM model is specified but it is not in a standard archive (.zip or .tar)"
fi
cd ..
fi
done
fi
Add
#get Diboson model
wget --no-check-certificate ${VVSOURCE}
cd models
tar xvzf ../${VVMODEL}
cd ..
Preparing run cards
There are 3 models, which all need runcards and gridpacks:
- Spin-0 Radion
- Spin-2 Bulk Graviton
- Spin-2 Randall-Sundrum (RS) Graviton
For each type you need 3 data files. Here are an example of the naming of the cards, for Radion model with resonance mass of 300
GeV:
Radion_hh_hZZToLLhbb_EMuOnly_narrow_M300_customizecards.dat
Radion_hh_hZZToLLhbb_EMuOnly_narrow_M300_proc_card.dat
Radion_hh_hZZToLLhbb_EMuOnly_narrow_M300_run_card.dat
These files, as well as example files for the 2 graviton models, can be found on lxplus at ~dmorse/public/HH/exampleMGrunCards/
New data files need to be created for each mass X and model Y, changing each instance of mass 300 into mass X, and renaming the data files with model Y and mass X. The data files should always be kept, as they will need to be added to the HH github repository later. Be careful to keep track of all mass references and the model type.
With these three files the gridpack can be created following the instructions on the Twiki. I ran interactively, but it may be worthwhile to use the grid since there will be 16*3=48 gridpacks to produce.
This should be repeated for the following masses:
250
260
270
300
350
400
450
500
550
600
650
700
750
800
900
950
8 TeV Second Gen LQ Pair Production Analysis Documentation
History
- A public PAS was produced in 2013, which is available on the PhysicsResultsEXO12042 twiki page.
- The Ntuple files for this were good, but updates to the JEC are needed for publication and updated to the new high pT muon ID will be helpful.
- The Ntuple files used for the public PAS are backed up on github in this spreadsheet.
- Useful links:
To-Do
- Overall goal is a full-reload of public PAS analysis, with new ntuples, and no other analysis changes.
- Tasks include:
- NTuple Updates
- Dedicated github branch for paper version
- Event counts and CSV preparation for new ntuple
- Update NTupleAnalyzer.py script for new muon ID
- Fix up the LQResultProducer.py script - remove clutter, improve comments.
- Port the limit-producer code from old usercode repo to new github area
- Updates for vector LQ - AODSIM dataset production in progress
Part I: Production of NTuples
Overview
Status
- All LQ Scalar signal, background, and single-mu data is processed.
- Vector LQ samples and RPV susy samples will need processing.
- Old MC was 53X_V7, and new is 53X_V19 - possible updates to ntuplemaker needed. New MC created in 5_3_16.
Where to find NTuples.
Part II: Analysis of NTuples
All of this code is stored in a single github repository, which can be checked out as follows:
git clone git@github.com:darinbaumgartel/LQ2Analysis8TeV.git LQ2Analysis8TeV
cd LQ2Analysis8TeV
git checkout paperversion
cmsrel CMSSW_6_2_0
cd CMSSW_6_2_0/src
cmsenv
cd -
Some of these instructions are re-iterated from the
github readme file
The goal of Part II is to creation of small flat analysis trees, since the Ntuples on EOS are usually on the order of 10TB, and need to be reduced for running interactive analysis. Jobs are run on the lxplus batch system to analyze the NTuples. The ntuples contain variables such as:
- Vectors of values for kinematics. i.e. "PFJetPt" is a vector of doubles, with one pT value for each jet stored in the event.
- Vectors of values for quality criteria (id, isolation, etc)
- Event-level information, including MET quality flags, vectors of different PDF weights (for MC), vertex information, etc.
The analysis trees aim at being roughly a factor of 1000 smaller than the ntuples (order of 10GB), and to contain only single valued variables (ints and floats)... ostensibly basically a large flat ntuple. (The Tntuple class itself does not allow enough variables). Values in the trees will be things like:
- The pT of the leading muon, and of the subleading muon, and the same for jets. And the eta and phi values.
- Kinematic variables used for optimization, like the di-muon mass, the MT(muon,MET), the ST = (pt_muon1+pt_muon2+pt_jet1+pt_jet2), etc...
- Single event weights which include the pileup reweighting.
- Trigger flags, and event quality flags.
- Basically, anything needed to generate a histogram with the appropriate weights, and interactively changeable cuts.
The following are steps for creating the trees
Getting integrated luminosity and pileup weight
A shell script is available at
this link, and has been made for the newest
(Jan2012ReReco) datasets. It will retun root files with the PU histograms, which will be used interactively by the tree making code to get the weights for each event.
It can be run like:
./GetPULumiInfo.tcsh
This is already done in practice. Histograms are available for the
central value and for systematic variations
up and
down
The integrated luminosity is just the official value (19.712 +/- 0.513)/fb - from
this HN thread
Get organized with a spreadsheet
A simple csv file is used to keep track of everything involved in the tree production. This includes the locations of the ntuple files on EOS, any
JSON file you wish to use on real data, cross-sections, original numbers of events (for normalization) and group names (output files will be merged according to their group).
An example of a spreadsheet (in progress) is at
this link .
The spreadsheet must contain the following columns:
SignalType |
A unique identifier at the beginning of the names of the root files. |
Xsections |
The cross-section to normalize the sample to (NLO is better!) |
N_orig |
The original number of events in the sample (If you don't know this, see next steps for instructions) |
Group |
The group for the files. For instance, you might have three SignalTypes like WW, WZ, and ZZ, and want to put them in a Group called "DiBoson" |
CustomJson |
The name of a Json file specifying the good runs/lumis to use. This can be the same for every data type, or different, or 0 for MC |
EOSDirectory |
The eos path where the files are kept for this signaltype. Should be like a typical EOS path e.g. /store/group/..../ |
This spreadsheet will be a command-line argument of the
analysis batcher (discussed more in steps below).
May 14, 2014
A summer 2014 CSV file has been created with V00-03-11 ntuples for the reload of the public PAS version of the LQ2 analysis. A couple data jobs (out of thousands) are missing, but it is functional for preparing the analysis.
What if you don't know the original number of events in the ntuples?
Since the NTupleMaker
applies a skim, histograms are stored in the NTuple root files which contain the original number of events. This is needed for correct MC event weights.
The
analysis batcher is built to read the csv file, and use a run a python script on every root file contained in the EOS directories specified in the CSV. Normally this is for making tree files, but it can also just make a txt file containing the number of events from the event count histogram in the NTuples, and we can merge that info to get the totals - and then fix up the CSV.
In practice it is much easier than it sounds
Just run the following:
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleEvCounter.py -t EventCount_Summer2014_V01 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 8nh -s 250 --FileRefresh --merge
Just to clarify on the arguments:
-i CSV File |
The CSV file you wish to run on. |
-t Tag name |
Results will output in a directory specified by this tag-name |
-j JSON file |
Not important here |
-p 0 |
Not important here. |
-q 8nh |
The batch queue to use. |
-s 100 |
Number of files to analyzer per batch job (the split number) |
--FileRefresh |
The code will automatically make a list of good ntuple files and store it locally for future use, so it doesn't have to re-read directories all the time. This demands to re-read directories. |
--merge |
Compile the results. For event counting, this lists total aggregated counts. For tree-making, it hadds the output into single root files (SummaryFiles) |
This will produce an event count log (see screen output) and you can update the csv file accordingly.
May 14, 2014
The summer 2014 CSV file file has been used to update the event counts. This is the syntax:
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleEvCounter.py -t EventCount_Summer2014_V01 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 8nh -s 250 --FileRefresh
Furthermore, if you ran the above commmand, and you don't want to wait for it to finish, leave and pickup where you left off. Here, the previous command made the directory "NTupleEvCounter_EventCount_Summer2014_V01_2014_05_12_23_20_07", and this is the command to continue where we left off:
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleEvCounter.py -t EventCount_Summer2014_V01 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 8nh -s 250 -d NTupleEvCounter_EventCount_Summer2014_V01_2014_05_12_23_20_07
Finally, add the "--merge" flag to take all the output files (in this case one txt file per root file) and sum the output
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleEvCounter.py -t EventCount_Summer2014_V01 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 8nh -s 250 -d NTupleEvCounter_EventCount_Summer2014_V01_2014_05_12_23_20_07 --merge
A message on the screen will tell you the output file location. e.g.:
Output is in /afs/cern.ch/user/d/darinb/LQ2014/NTupleAnalyzerV3B/NTupleInfo2012Full_Summer2014_EventCountLog.csv
The summer 2014 CSV file has been updated with these event counts.
How to create the analysis trees.
Similar to what we did above, a different python file will be used to analyze the ntuples (using pyROOT), and make lightweight trees. There will be one file for every NTuple file, but the tree will use "hadd" to merge these files according to the "Group" in the CSV file. At the end of it all, we will have just a few root files like "SingleMuData.root" "ZJets.root" etc - which contain everything we need for each data or MC sample.
These files can take up several GB. I recommend using an AFS work space like "/afs/cern.ch/work/u/username/..." and bumping up your quota to 100GB at the
afs workspaces page
Running is similar to the previous step. This is the command:
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleAnalyzer.py -t FullJun27 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 1 -q 2nd -s 100 --merge
To detail the arguments:
-i CSV File |
The CSV file you wish to run on. |
-t Tag name |
Results will output in a directory specified by this tag-name |
-j JSON file |
Default JSON file. Over-ridden by column in CSV file. Always put something here. |
-p 0 |
Whether or not to store branches of pdf weights. |
-q 1nd |
The batch queue to use. |
-s 15 |
Number of files to analyze per batch job (the split number) |
-y 2016 |
Year of the data and MC. Choices are 2016, 2017, 2018 |
--FileRefresh |
The code will automatically make a list of good ntuple files and store it locally for future use, so it doesn't have to re-read directories all the time. This demands to re-read directories. |
--merge |
Compile the results. For event counting, this lists total aggregated counts. For tree-making, it hadds the output into single root files (SummaryFiles) |
Other studies might require slight modifications to what you want to run. A couple are built in via the tagname option. These include:
QCDNonIso |
Removed muon isolation so that QCD studies can be performed |
EMuSwitch |
Replaces one muon with a HEEP electron, for ttbar studies in an enriched e-mu control region |
Examples of these runs are as follows. Typically you might not need the full CSV files for these, so you could make new ones with just the datasets you need
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleAnalyzer.py -t FullJun27EMuSwitch -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 1 -q 2nd -s 100 -y 2016 --merge
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleAnalyzer.py -t FullJun27QCDNonIsoQuickTest -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 1nd -s 100 -y 2016 --merge
What if it takes too long?
It is also important to note that these jobs can take several hours or a day, depending on what you run. The
AnalyzerMakerFastLocal.py script stays on and waits for the batch jobs to finish. If a file is missing, it relaunches it. This is all transparent to the user, but it can stay on the screen for quite a while.
There are two tools to help out with this.
- The first (and best) way is to to leave and come back later. When you see that the AnalyzerMakerFastLocal script has submitted its jobs and is waiting for them to be done, you can just ctrl+c and come back later. The TagName option has made a working directory with a time-and-date stamp, and you can specify to pick up where you left off. So go home, come back, and re-issue the the command specifying the working directory with the "-d" command, like:
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full.csv -py NTupleAnalyzer.py -t MyFirstFullRun -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 1 -q 1nd -s 15 -y 2016 -d NTupleAnalyzer_MyFirstFullRun_2013_10_22_23_36_59
- Another helpful tool is "screen". If you don't want to wait around for jobs to finish, use a "screen" on lxplus. Note the lxplus machine you are on (e.g. lxplus430), and open a screen by typing "screen". Launch the jobs with AnalyzerMakerFastLocal.py as described above. Type ctrl+A+D to exit. You can then log out, go home, etc, and log back into to lxplus430, and type "screen -r" to return to your screen.
Part III: Running the result producer
From the flat trees, analysis and results are performed. This includes many steps, almost all of which are done in a
single result producer python script . In this script, you will find the following:
- The definition of the event weight, according to the weight variable in the trees, HLT reweighting MC with a single-mu trigger, and the integrated luminosity.
- Definition of a preselection - a selection which demands the LQ decay product particles, but doesn't make strong cuts. Normalization of the Z, W, and TTbar backgrounds will be done at preselection.
- Functions that make histograms at preselection and final selection.
- Functions that can optimize signal-background separation with cuts based on S/sqrt(S+B) and store the result to a txt file.
- Functions that can make the event counts tables based on the above optimizations.
- Functions that can make cards compatible with the Higgs limit tools, to get the real "final" results.
- Unused functions which can perform analysis with TMVA (interesting, but not needed for our normal results). [deprecated]
Basically, everything you need to run is in the main() function of the script. This is just a few background studies, plot production, and table production (tables get parse to higgs limit tool cards). This is a brief overview of the commonly used parts of the code.
Setup and directory
At the beginning of the main function, you will see the following lines, followed by the definition of the variable binnings for different histograms. The results in this example would be output to a directory called "Results_Testing_Aug8". It will attempt to make this directory. If it already exists, that is fine. The *OptCutFile files must be present in the directory - they detail the final selection cuts.
def main():
#######################################################################################
###### The output directories, and the files that define final selection cuts #######
#######################################################################################
# Please retain the "scriptflag" comment. Some python scripts are available which search
# for this, and make use of it. e.g. For systematic variations, we can in batch instead
# of running serially, which speeds things up.
version_name = 'Testing_Aug8' # scriptflag
os.system('mkdir Results_'+version_name)
MuMuOptCutFile = 'Results_'+version_name+'/Opt_uujjCuts_Smoothed_pol2cutoff.txt' # scriptflag
MuNuOptCutFile = 'Results_'+version_name+'/Opt_uvjjCuts_Smoothed_pol2cutoff.txt' # scriptflag
PDF Uncertainties
The PDF uncertainties are time consuming. By enabling the following lines (turning False to True), you will run the PDF studies. This will print out some lists of uncertainties on the screen. I recommend not to do this normally. You can paste the output of this routine into the "HARD CODED RESULTS USED IN ANALYSIS" section.
# ====================================================================================================================================================== #
# These are PDF uncertainty studies. Ignore these for now! ### STILL FIXING THIS PORTION
# ====================================================================================================================================================== #
if (False):
PDF4LHCUncStudy(MuMuOptCutFile,MuNuOptCutFile,version_name)
PDF4LHCPlotsFromResultDict('Results_'+version_name+'/PDFVariationsDictionary.json',version_name)
QCD Studies
QCD is negligible, and we just conduct a few studies to validate that. These studies make plots and produce a few numbers to cite in the analysis note. You can enable the QCD studies by turning on this part of the code:
# ====================================================================================================================================================== #
# This is the QCD study. It will make a couple plots, and test QCD contamination at final selection. We consider QCD negligible, but is good to test this once!
# ====================================================================================================================================================== #
if (False):
qcdselectionmumu = '((Pt_muon1>45)*(Pt_muon2>45)*(Pt_jet1>125)*(Pt_jet2>45)*(St_uujj>300)*(DR_muon1muon2>0.3))'
qcdselectionmunu = '((Pt_muon1>45)*(Pt_muon2<45.0)*(Pt_jet1>125)*(Pt_jet2>45)*(Pt_ele1<45.0)*(St_uvjj>300)*(DPhi_muon1met>0.8)*(DPhi_jet1met>0.5))'
QCDStudy(qcdselectionmumu,qcdselectionmunu,MuMuOptCutFile,MuNuOptCutFile,NormalWeightMuMu,NormalWeightMuNu,version_name)
TTbar E-Mu data-driven studies
The data-driven TTbar estimate relies on a calculation of the mu-mu to e-mu efficiency, and we also provide some characteristic plots and numbers in the analysis note. To create this material, you can turn on the following part of the main function. Once again, there is no need to run this all the time. When the study is stable, you just need to store the reweighting factor in the "HARD CODED RESULTS" section of the script - e.g. " emu_id_eff = 0.5716 "
# ====================================================================================================================================================== #
# The ttbar e-mu data-driven study. You only need to do this once, check the validation output, and use the scale-factor as global variable "emu_id_eff"
# The scale-factor should be near 0.5
# ====================================================================================================================================================== #
if (False):
# TTBar STudy
[Rtt_uujj,Rtt_uujj_err] = GetEMuScaleFactors( NormalWeightEMu+'*'+preselectionemu, EMuDirectory)
Rw_uvjj,Rz_uujj = [1.0,1.0]
# # PreSelection Plots
MakeBasicPlotEMu("St_uujj","S_{T}^{e #mu j j} [GeV]",stbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("Pt_miss","E_{T}^{miss} [GeV]",metbinning2,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("M_uu","M^{e #mu} [GeV]",bosonbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("M_uujj2","M^{e/#mu j}_{2} [GeV]",lqbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("DR_muon1muon2","#DeltaR(#mu,e})",drbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("St_uujj","S_{T}^{e #mu j j} [GeV]",stbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("Pt_miss","E_{T}^{miss} [GeV]",metbinning2,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("M_uu","M^{e #mu} [GeV]",bosonbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("M_uujj2","M^{e/#mu j}_{2} [GeV]",lqbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlotEMu("DR_muon1muon2","#DeltaR(#mu,e})",drbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
Making analysis-note style plots
The analysis-note style plots have a ratio subplot, and no CMS stamp. Each plot is a single function-call of MakeBasicPlot(args). There is also an option to make a basic results table. You can turn all this on in the following part of the code:
# ====================================================================================================================================================== #
# This is a basic plotting routine to make Analysis-Note style plots with ratio plots.
# ====================================================================================================================================================== #
if (False):
# Get Scale Factors
[[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
[[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')
# Optionally, you can make an event-count table for each selection. Useful if testing a new optimiation
# We will do this later wtih full systematics for our set of stable cuts.
if False:
QuickTableTTDD(MuMuOptCutFile, preselectionmumu+"*(M_uu>100)",NormalWeightMuMu,Rz_uujj, Rw_uvjj,Rtt_uujj,0)
QuickTable(MuNuOptCutFile, preselectionmunu,NormalWeightMuNu,Rz_uujj, Rw_uvjj,Rtt_uvjj,0)
# Here are a few plots which are zoomed-in on control regions.
MakeBasicPlot("M_uu","M^{#mu #mu} [GeV]",bosonzoombinning_uujj_Z,preselectionmumu,NormalWeightMuMu,NormalDirectory,'controlzoom_ZRegion','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
...etc...
# UUJJ plots at preselection, Note that putting 'TTBarDataDriven' in the name turns on the use of data-driven ttbar e-mu sample in place of MC
MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standard_TTBarDataDriven','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
... etc ...
Running the PAS-style plots
PAS-style plots have no ratio subplots, and have a CMS stamp. You can enable this portion of the main function to run the PAS-style plots. There is also a loop over LQ mass values that you can easily modify for different final-selection plots. Using the "tagfree" file tag will remove the CMS stamp.
# ====================================================================================================================================================== #
# This is a plotting routine for PAS-style publication-quality plots
# ====================================================================================================================================================== #
if False:
# Get Scale Factors
[[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
[[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')
...etc...
# The two flags are for regular plots, and tagfree plots (plots that don't say CMS Preliminary - for notes or thesis)
for flag in ['','tagfree']:
# Preselection plots in the UUJJ channel in the PAS style (no subplot)
MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
... etc ...
Running the full event yields (with all systematic variations) and making higgs limit-tool cards
There are two ways to do this. It is easy to just let this run, but in practice it has been time consuming (many systematic variations) and can take more than a day. The end-goal of this all is that it produces several txt files in the result directories containing the event yields, and then combines all that information into higgs limit tool cards.
The long way to do this is to just let this part of the code run:
# ====================================================================================================================================================== #
# This runs a "FullAnalysis" - i.e. produces tables with full systematis included.
# ====================================================================================================================================================== #
# You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
# Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
# When done, proceed to the next step to make higgs limit cards
if (True):
FullAnalysis(MuMuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuMu,'TTBarDataDriven') # scriptflag
FullAnalysis(MuNuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuNu,'normal') # scriptflag
if (True):
uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')
uujjcards = ParseFinalCards(uujjcardfiles)
uvjjcards = ParseFinalCards(uvjjcardfiles)
finalcards = FixFinalCards([uujjcards,uvjjcards])
print 'Final Cards Available in ',finalcards
The short way circumvents the running of the FullAnalysis(...) functions, by making multiple copies of the LQResultProducer - one for each systematic variation - and running them in batch in parallel. This is much faster, but you have to rely on another script to parse it all out. When the script finishes, you just have to run the second part of the above (with the ParseFinalCards and FixFinalCards commands). To use this method, do the following, and wait for the batch-jobs to finish.
(1) Run
python SysBatcher.py --launch
(2) Wait for those batch jobs to finish, and then use just the second part of the above code
if (True):
uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')
uujjcards = ParseFinalCards(uujjcardfiles)
uvjjcards = ParseFinalCards(uvjjcardfiles)
finalcards = FixFinalCards([uujjcards,uvjjcards])
print 'Final Cards Available in ',finalcards
Part IV: Making limit plots
The limit-setting code in now in a
subdirectory of the github paperversion branch. It has some
standalone instructions, but since the LQResultProducer already produces a compatible FinalCards.txt, here are the short instructions.
(1) Checkout the higgs limit tool code and appropriate CMSSW version
setenv SCRAM_ARCH slc5_amd64_gcc472
cmsrel CMSSW_6_1_1
cd CMSSW_6_1_1/src
cmsenv
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit
git pull origin master
git checkout V03-05-00
cd ../../
scramv1 b clean
scramv1 b -j 16
cmsenv
cd ../../
combine --help
(2) Put your FinalCards.txt file in the same directory as the RunBasicStatsCLs.py script.
(3) Run the single-channel limits (limit results will print to screen - save them!).
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only
(4) Run the combined limit. It will have to batch out some jobs, so it is a two step process:
- Run the script which creates and launches the jobs (one job per beta value)
- When jobs are complete, merge them. Results will print to the screen.
5) Take the results from (3) and (4) and put them into the
plot macros. The plot macros compile and run. An example of this wold be:
{
gROOT->ProcessLine(".L ComboPlotLQ2.C++");
gROOT->ProcessLine("makePlots()");
gROOT->Reset();
gROOT->ProcessLine(".L BR_Sigma_MuMu_vsMass.C++");
gROOT->ProcessLine("makePlotsBO()");
gROOT->Reset();
gROOT->ProcessLine(".L BR_Sigma_MuNu_vsMass.C++");
gROOT->ProcessLine("makePlotsBH()");
gROOT->ProcessLine(".q");
}
Part V: Making publication-quality tables
The event yield and systematics tables can be automatically created based on the FinalCards.txt file (the higgs limit-card file) which has already been created. It simply needs to be parsed into a LaTeX table. There are two scripts available.
To get the content of the event yields table, do :
python CleanTableParse.py your/path/to/FinalCards.txt
To get tables of systematic uncertainties, do:
python SysTableParse.py your/path/to/FinalCards.txt
This last command also prints out the total percent uncertainties for each channel/mass. This list is used in the LQResultProducer.py in the "HARD CODED RESULTS" section to define the uncertainty band, used in the PAS-style plots.
Part VI: Making PAS-quality plots.
The output of the table production (Part 4) will give you lists of systematic uncertainties percent values at each LQ mass. Use this in the "HARD CODED RESULTS" part of the LQ result producer to update the lines:
# These are the total background uncertainties. They are used just to make some error bands on plots.
totunc_uujj = [3.46, 2.55, 2.98, 3.67, 5.44, 5.73, 6.21, 6.47, 9.47, 11.16, 14.79, 19.75, 31.11, 35.41, 8.12, 8.12, 8.12, 8.12, 8.12]
totunc_uvjj = [13.37, 12.37, 13.35, 12.97, 13.62, 15.94, 16.23, 20.67, 23.94, 30.29, 35.3, 52.94, 69.57, 46.09, 42.3, 42.3, 42.3, 42.3, 42.3]
Then you can run the standard PAS plots in the LQResultProducer, i.e. enable this part of the main function:
# ====================================================================================================================================================== #
# This is a plotting routine for PAS-style publication-quality plots
# ====================================================================================================================================================== #
if False:
# Get Scale Factors
[[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
[[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')
...
# The two flags are for regular plots, and tagfree plots (plots that don't say CMS Preliminary - for notes or thesis)
for flag in ['','tagfree']:
# Preselection plots in the UUJJ channel in the PAS style (no subplot)
MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
MakeBasicPlot("Pt_jet2","p_{T}(jet_{2}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
...
Making results for Vector LQ
The goal here will be to port all this machinery to making Vector LQ results with as little effort as possible. To do so, I've made some tweaks to the LQResultProducer, such that many of the output file names will contain the letters "LQ". From now on, "LQ" will indicate the normal (scalar) analysis, and the different vector scenarios will be denoted by "AM", "MC", "MM", and "YM". Correspondingly, I've organized the tree-making facility to name all the new signal files like this. e.g., see
the vector LQ csv file. After making trees with the vector LQ csv file, and putting those new files into the same directory as the scalar files, we can proceed with some modifications to the code to run on vector LQ.
Fixing up the cut cards
In the results directory, there are typically a couple txt files which store the optimized cuts. They are called something like "Opt_uvjjCuts_Smoothed_pol2cutoff.txt", and store text that looks like this:
opt_LQuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_LQuujj350 = ((M_uu>115)*(St_uujj>460)*(M_uujj2>115))
opt_LQuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
.....
In contrast our vector LQ has mass points every 100
GeV, and goes up to 1800. First, let's fix the file names of the cut cards for scalar:
mv Opt_uujjCuts_Smoothed_pol2cutoff.txt OptLQ_uujjCuts_Smoothed_pol2cutoff.txt
mv Opt_uvjjCuts_Smoothed_pol2cutoff.txt OptLQ_uvjjCuts_Smoothed_pol2cutoff.txt
Next, copy the content of these files to new vector YM files, changing the "LQ" to "YM", using only the values every 100
GeV, and extending to 1800 (holding the last cuts fixed. For example:
Contents of OptYM_uujjCuts_Smoothed_pol2cutoff.txt:
opt_YMuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_YMuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
opt_YMuujj500 = ((M_uu>150)*(St_uujj>685)*(M_uujj2>155))
opt_YMuujj600 = ((M_uu>175)*(St_uujj>820)*(M_uujj2>210))
opt_YMuujj700 = ((M_uu>195)*(St_uujj>935)*(M_uujj2>295))
opt_YMuujj800 = ((M_uu>215)*(St_uujj>1040)*(M_uujj2>400))
opt_YMuujj900 = ((M_uu>230)*(St_uujj>1135)*(M_uujj2>535))
opt_YMuujj1000 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1100 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1200 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1300 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1400 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1500 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1600 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1700 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1800 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
Contents of OptYM_uvjjCuts_Smoothed_pol2cutoff.txt:
opt_YMuvjj300 = ((MT_uv>155)*(St_uvjj>455)*(M_uvjj>125))
opt_YMuvjj400 = ((MT_uv>205)*(St_uvjj>625)*(M_uvjj>175))
opt_YMuvjj500 = ((MT_uv>245)*(St_uvjj>800)*(M_uvjj>225))
opt_YMuvjj600 = ((MT_uv>275)*(St_uvjj>980)*(M_uvjj>280))
opt_YMuvjj700 = ((MT_uv>300)*(St_uvjj>1160)*(M_uvjj>330))
opt_YMuvjj800 = ((MT_uv>315)*(St_uvjj>1345)*(M_uvjj>380))
opt_YMuvjj900 = ((MT_uv>320)*(St_uvjj>1530)*(M_uvjj>435))
opt_YMuvjj1000 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1100 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1200 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1300 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1400 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1500 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1600 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1700 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1800 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
These cuts are all the same for all vector LQ samples, so just copy it to appropriately named/structured files for "AM", "MC", and "MM".
sed 's/YM/MC/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptMC_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MC/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptMC_uvjjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/AM/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptAM_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/AM/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptAM_uvjjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MM/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptMM_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MM/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptMM_uvjjCuts_Smoothed_pol2cutoff.txt
Running on the vector samples
The LQResultProducer has been written such that we can just replace the "LQ" with any of the vector names and run. Since we are basically just interested in getting Higgs Limit cards for each of the vector samples, we can do the same with the SysBatcher.
First, clone the LQResultProducer and the SysBatcher for vector purposes:
sed 's/LQ/YM/g' LQResultProducer.py > YMResultProducer.py
sed 's/LQ/YM/g' SysBatcher.py > YMSysBatcher.py
sed 's/LQ/MM/g' LQResultProducer.py > MMResultProducer.py
sed 's/LQ/MM/g' SysBatcher.py > MMSysBatcher.py
sed 's/LQ/MC/g' LQResultProducer.py > MCResultProducer.py
sed 's/LQ/MC/g' SysBatcher.py > MCSysBatcher.py
sed 's/LQ/AM/g' LQResultProducer.py > AMResultProducer.py
sed 's/LQ/AM/g' SysBatcher.py > AMSysBatcher.py
Next, just launch the batchers and wait. I recommend not launching them all at once - wait an hour or two between each. Each of these is opening the root files in an afs workspace - so it is best to stagger it a bit.
python YMSysBatcher.py --launch
python MMSysBatcher.py --launch
python MCSysBatcher.py --launch
python AMSysBatcher.py --launch
Then, you can produce the higgs limit cards (the "FinalCards" files in the same as the scalar samples:
https://twiki.cern.ch/twiki/bin/view/Main/LQPairSecondGen13TeVOutline#Running_the_full_event_yields_wi
You will have to do that once, for each of YMResultProducer.py, MCResultProducer.py, MMResultProducer.py, and AMResultProducer.py. It will yield four card files: FinalCardsYM.txt, FinalCardsMC.txt, FinalCardsMM.txt and FinalCardsAM.txt.
Getting vector LQ limits
First, we have to fix one small problem. Because the Higgs limit tools presents the limit as a ratio of the assumed cross section (r), and reports it to 5 digits, the very large size of the LQ signal means we have to increase the r precision. I find that the building works best on lxplus5, but running the combine tool works fine on lxplus6 thereafter.
cd CMSSW_6_1_1/src
sed -i 's#6.4#9.9#g' ./HiggsAnalysis/CombinedLimit/src/Asymptotic.cc
scram b -j 16
New flags have been built into the limit tool to deal with the vector scenario. These flags switch on the use of the vector LQ card file (e.g. FinalCardsYM.txt), and they can be used as follows:
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorYM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorAM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorMM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorMC
Making results for RPV Susy In Progress
In this section we will follow the same methods as the vector LQ samples to get limits on the RPV susy (232) samples. These samples have mass values ranging from 200 to 1000 in 50GeV increments. The Scalar LQ samples were denoted "LQ" and the Vector samples as "YM", "MC", "MM", and "AM". For Susy we will follow a similar convention and label everything "RV"
Creating the cut card
In this example, I will use the same cuts as the regular LQ signal. Dave is further testing the optimal cuts and can modify this
Cuts should use m_stop-100 GeV LQ optimization point.
Contents of OptRV_uujjCuts_Smoothed_pol2cutoff.txt:
opt_RVuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_RVuujj350 = ((M_uu>115)*(St_uujj>460)*(M_uujj2>115))
opt_RVuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
opt_RVuujj450 = ((M_uu>140)*(St_uujj>615)*(M_uujj2>135))
opt_RVuujj500 = ((M_uu>150)*(St_uujj>685)*(M_uujj2>155))
opt_RVuujj550 = ((M_uu>165)*(St_uujj>755)*(M_uujj2>180))
opt_RVuujj600 = ((M_uu>175)*(St_uujj>820)*(M_uujj2>210))
opt_RVuujj650 = ((M_uu>185)*(St_uujj>880)*(M_uujj2>250))
opt_RVuujj700 = ((M_uu>195)*(St_uujj>935)*(M_uujj2>295))
opt_RVuujj750 = ((M_uu>205)*(St_uujj>990)*(M_uujj2>345))
opt_RVuujj800 = ((M_uu>215)*(St_uujj>1040)*(M_uujj2>400))
opt_RVuujj850 = ((M_uu>220)*(St_uujj>1090)*(M_uujj2>465))
opt_RVuujj900 = ((M_uu>230)*(St_uujj>1135)*(M_uujj2>535))
opt_RVuujj950 = ((M_uu>235)*(St_uujj>1175)*(M_uujj2>610))
opt_RVuujj1000 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
Run the event yields and produce higgs limit cards
Just as with the vector LQ, we can clone the LQResultProducer and the SysBatcher for the Susy scenario. Since the RPV susy is only in the mu-mu-j-j channel, we make one slight tweak to remove the uvjj running
sed 's/LQ/RV/g' LQResultProducer.py > RVResultProducer.py
cat SysBatcher.py | sed 's/LQ/RV/g' | sed "s/,'uvjj'//g" > RVSysBatcher.py
Next, just launch the batcher and wait for the jobs to finish.
python RVSysBatcher.py --launch
Then, you can produce the higgs limit cards (the "FinalCards" files in the same as the scalar samples:
https://twiki.cern.ch/twiki/bin/view/Main/LQPairSecondGen13TeVOutline#Running_the_full_event_yields_wi )
To do so, modify the "FullAnalysis" portion of the RVResultProducer.py as such:
# ====================================================================================================================================================== #
# This runs a "FullAnalysis" - i.e. produces tables with full systematis included.
# ====================================================================================================================================================== #
# You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
# Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
# When done, proceed to the next step to make higgs limit cards
if (True):
FullAnalysis(MuMuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuMu,'TTBarDataDriven') # scriptflag
if (True):
uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
uujjcards = ParseFinalCards(uujjcardfiles)
finalcards = FixFinalCards([uujjcards])
print 'Final Cards Available in ',finalcards
And then just run it, e.g.
python RVResultProducer.py
You should now have a "FinalCardsRV.txt" file in your results directory.
Getting RPV Susy limits
Note: this section requires the version of RunStatsBasicCLs.py committed to the paperversion on github on Aug 30. Please update repository if needed.
Proceed exactly as in scalar and vector LQ case, using the flag "--susyRV" during limit setting. To be precise, if you've never set up the Higgs limit tool in the LimitSetting directory before, just put the FinalCardsRV.txt in the LimitSetting directory, and do this:
cd LimitSetting
setenv SCRAM_ARCH slc5_amd64_gcc472
cmsrel CMSSW_6_1_1
cd CMSSW_6_1_1/src
cmsenv
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit
git pull origin master
git checkout V03-05-00
cd ../../
scramv1 b clean
scramv1 b -j 16
cmsenv
sed -i 's#6.4#9.9#g' ./HiggsAnalysis/CombinedLimit/src/Asymptotic.cc
scram b -j 16
cd ../../
python RunStatsBasicCLs.py --do_BetaOne -c LimitTest --Asymptotic_Only --susyRV
Getting RPV Limit plots
A plot macro has been drafted at
LimitSetting/PlotMacros/BR_Sigma_MuMu_vsMassSUSYRPV.C
To run it, simply do:
root -b RunSusyRPVPlotsInRoot
An example output is at
this link
Documentation for LQ post-processing
From:
https://docs.google.com/document/d/1XZ35T-IwXzUpUCou3aqcIW35Cxx8j6_2lBjWqxjcmFQ/edit#heading=h.qpfb6p4b5p2s:
Modified code is kept on this
GitHub branch:
https://github.com/gmadigan/resonance_HH/tree/updates_for_LQ
Which has been forked from
https://github.com/menglu21/resonance_HH
To run, follow instructions in README.md or contact Meng Lu (
meng.lu@cernNOSPAMPLEASE.ch)
References for high-pT muons:
https://twiki.cern.ch/twiki/bin/view/CMS/MuonUL2016#High_pT_above_120_GeV
https://twiki.cern.ch/twiki/bin/view/CMS/MuonUL2017#High_pT_above_120_GeV
https://twiki.cern.ch/twiki/bin/view/CMS/MuonUL2018#High_pT_above_120_GeV
Changes (relative to HH analysis) have been made to the following files:
modified: crab/crab_script.py
modified: crab/samples2016.json
modified: crab/samples2016apv.json
modified: crab/samples2018.json
new file: modules/LQProducer.py
new file: modules/highPtMuonScaleResProducer.py
new file: modules/muonRECOSFProducer.py
new file: modules/topPtReweightProducer.py
crab/crab_script.py
This is where all the modules to run are specified. I have modified this to run the new modules I added (e.g., LQProducer) and removed modules we do not need. More modules will likely need to be added in the future for additional MC corrections. Current modules in use for MC are:
countHistogramsModule (saves the generated number of events for normalizing the samples)
PU reweighting
L1 Prefire corrections (for HCAL and Muon system)
Muon ID and isolation scale factors
Muon reconstruction scale factors
Muon scale and resolution corrections
Electron reconstruction scale factors
Electron ID scale factors
Jet energy corrections
b-jet efficiency scale factors
Top pT reweighting
LQ-specific selections and kinematic variables
For data:
Muon scale and resolution corrections
Jet energy corrections
LQ-specific selections and kinematic variables
crab/samples2016.json
Modified to only include background samples needed for LQ analysis, and single muon samples. Still need to add electron datasets.
crab/samples2016apv.json
Modified to only include background samples needed for LQ analysis, and single muon samples. Still need to add electron datasets.
crab/samples2018.json
Modified to only include background samples needed for LQ analysis, and single muon samples. Still need to add electron datasets.
(note that 2017 sample list has not been made yet)
modules/LQProducer.py
This module replaces HHProducer and will add many new branches (too many to list here!).
The branches can be categorized as follows:
Leading and sub-leading (trailing) muon kinematics (e.g., lead_muon_pt, trail_muon_pt), and Reco, Iso, ID, and HLT scale factors and their variations.
NOTE: HLT SFs set to 1 currently! Will need to write a module to get these SFs in future.
Leading and trailing electron kinematics
Leading and trailing jet kinematics, b-tag scores and b-tag efficiency scale factors
Several kinematic quantities, e.g., ST, M_uu, etc.
DeltaR and
DeltaPhi for different muon, jet, and electron combinations.
MET kinematics and filters
Selections:
Muons must pass the high-pT ID and loose relative tracker-based isolation (Muons must use
TuneP momentum!)
Electrons must pass the HEEP ID
Jets must pass the tight ID
A loose skim on muons with pT < 20
GeV is applied here as well
modules/highPtMuonScaleResProducer.py
This module replaces muonScaleResProducer and will add 12 branches to the ntuples:
Muon_highPtScaleRoccor_pt
Muon_highPtScaleRoccorUp_pt
Muon_highPtScaleRoccorDown_pt
Muon_highPtResSmeared_pt
Muon_highPtResSmearedUp_pt
Muon_highPtResSmearedDown_pt
Muon_highPtResSmeared_eta
Muon_highPtResSmearedUp_eta
Muon_highPtResSmearedDown_eta
Muon_highPtResSmeared_phi
Muon_highPtResSmearedUp_phi
Muon_highPtResSmearedDown_phi
The first three branches are the central, up, and down systematic variations of muon momenta corrected with Rochester corrections. It is important to note that these corrections are only applied to muons with pT < 200
GeV, and correspond to scale and resolution corrections.
The last eight branches are the central, up, and down, systematic variations of muon pT, eta, and phi after an additional MER smearing is applied. All three are modified, as the smearing is p dependant (not pT) and will therefore modify all three variables. This additional smearing is a specific prescription for high-pT muons.
As the LQ analysis uses
TuneP momentum, we need to use
TuneP momentum for all muons rather than PF momentum in this module.
NOTE: A systematic uncertainty on muon momentum scale for high pT muons with pT > 200
GeV using the Generalized Endpoint method is needed still
modules/muonRECOSFProducer.py
This module will add two branches to the ntuples:
Muon_Reco_LooseRelTkIsoandHighPtIDandIPCut_SF
Muon_Reco_LooseRelTkIsoandHighPtIDandIPCut_SFerr
These branches are the muon reconstruction SF and its error. As the LQ analysis uses the
TuneP momentum measurement for muons, it was important to implement this in a new module (rather than using the default muon reconstruction SF module which uses PF momentum)
modules/topPtReweightProducer.py
This module will add three new branches to the ntuples:
TopPtWeight
TopPtWeight_up
TopPtWeight_down
These branches are central, up, and down systematic variations of the event weights that will correct for top pT mismodeling. The corrections are year-dependent, and may or may not be needed in the analysis (EXO recommended for EXO-21-019)
CAUTION: These corrections should only be applied to TOP MC samples, but I have not yet figured out how to do this--they are currently added to all samples. One could add these to all samples and just apply the weight to top samples in the HHCoffea code?
--
DavidMorse - 14 Jan 2015