13 TeV Second Gen LQ Pair Production Analysis Documentation

History

To-Do

Part I: Production of NTuples

Overview

Status

  • All LQ Scalar signal, background, and single-mu data is processed.

Where to find NTuples.

Part II: Analysis of NTuples

All of this code is stored in a single github repository, which can be checked out as follows:

git clone git@github.com:DavidMorse/LQ2Analysis.git LQ2Analysis13TeV
cd LQ2Analysis13TeV
cmsrel CMSSW_10_2_18
cd CMSSW_10_2_18/src
cmsenv
cd -

Applying Rochester corrections to muon pT and estimating muon momentum scale systematics with the Rochester and Generalized Endpoint methods requires two additional packages:

git clone ssh://git@gitlab.cern.ch:7999/akhukhun/roccor.git RoccoR
git clone ssh://git@gitlab.cern.ch:7999/cms-muonPOG/GeneralizedEndpoint/GEScaleSyst.git GEScaleSyst

Some of these instructions are re-iterated from the github readme file

The goal of Part II is to creation of small flat analysis trees, since the Ntuples on EOS are usually on the order of 10TB, and need to be reduced for running interactive analysis. Jobs are run on the lxplus batch system to analyze the NTuples. The ntuples contain variables such as:

  • Vectors of values for kinematics. i.e. "PFJetPt" is a vector of doubles, with one pT value for each jet stored in the event.
  • Vectors of values for quality criteria (id, isolation, etc)
  • Event-level information, including MET quality flags, vectors of different PDF weights (for MC), vertex information, etc.

The analysis trees aim at being roughly a factor of 1000 smaller than the ntuples (order of 10GB), and to contain only single valued variables (ints and floats)... ostensibly basically a large flat ntuple. (The Tntuple class itself does not allow enough variables). Values in the trees will be things like:

  • The pT of the leading muon, and of the subleading muon, and the same for jets. And the eta and phi values.
  • Kinematic variables used for optimization, like the di-muon mass, the MT(muon,MET), the ST = (pt_muon1+pt_muon2+pt_jet1+pt_jet2), etc...
  • Single event weights which include the pileup reweighting.
  • Trigger flags, and event quality flags.
  • Basically, anything needed to generate a histogram with the appropriate weights, and interactively changeable cuts.

The following are steps for creating the trees

Getting integrated luminosity and pileup weight

With nanoAOD tools it is no longer necessary to calculate pileup weights ourselves. Directions below are kept for posterity.

A shell script is available at this link, and has been made for the newest. It will retun root files with the PU histograms, which will be used interactively by the tree making code to get the weights for each event.

It can be run like:

./GetPULumiInfo.tcsh

This is already done in practice. Histograms are available for the central value and for systematic variations up and down

The integrated luminosity is the official value (35.9 +/- 2.5%)/fb

Get organized with a spreadsheet

A simple csv file is used to keep track of everything involved in the tree production. This includes the locations of the ntuple files on EOS, any JSON file you wish to use on real data, cross-sections, original numbers of events (for normalization) and group names (output files will be merged according to their group).

An example of a spreadsheet (in progress) is at this link .

The spreadsheet must contain the following columns:

SignalType A unique identifier at the beginning of the names of the root files.
Xsections The cross-section to normalize the sample to (NLO is better!)
N_orig The original number of events in the sample (If you don't know this, see next steps for instructions)
Group The group for the files. For instance, you might have three SignalTypes like WW, WZ, and ZZ, and want to put them in a Group called "DiBoson"
CustomJson The name of a Json file specifying the good runs/lumis to use. This can be the same for every data type, or different, or 0 for MC
EOSDirectory The eos path where the files are kept for this signaltype. Should be like a typical EOS path e.g. /store/group/..../
This spreadsheet will be a command-line argument of the analysis batcher (discussed more in steps below).

What if you don't know the original number of events in the ntuples?

Since the NTupleMaker applies a skim, histograms are stored in the NTuple root files which contain the original number of events. This is needed for correct MC event weights.

The analysis batcher is built to read the csv file, and use a run a python script on every root file contained in the EOS directories specified in the CSV. Normally this is for making tree files, but it can also just make a txt file containing the number of events from the event count histogram in the NTuples, and we can merge that info to get the totals - and then fix up the CSV.

In practice it is much easier than it sounds smile Just run the following:

 
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleEvCounter.py -t EventCount_2016Full -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 0 -q espresso -s 100 --FileRefresh --merge

Just to clarify on the arguments:

-i CSV File The CSV file you wish to run on.
-t Tag name Results will output in a directory specified by this tag-name
-j JSON file Not important here
-p 0 Not important here.
-q espresso The HTCondor queue to use.
-s 100 Number of files to analyzer per batch job (the split number)
--FileRefresh The code will automatically make a list of good ntuple files and store it locally for future use, so it doesn't have to re-read directories all the time. This demands to re-read directories.
--merge Compile the results. For event counting, this lists total aggregated counts. For tree-making, it hadds the output into single root files (SummaryFiles)
This will produce an event count log (see screen output) and you can update the csv file accordingly.

Furthermore, if you ran the above commmand, and you don't want to wait for it to finish, kill the process, leave, and pickup where you left off. Here, the previous command made the directory "NTupleEvCounter_EventCount_Summer2016_Full_2016_05_12_23_20_07", and this is the command to continue where we left off:

python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleEvCounter.py -t EventCount_2016Full -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 0 -q espresso -s 100 -d NTupleEvCounter_EventCount_Summer2016_Full_2016_05_12_23_20_07

Finally, add the "--merge" flag to take all the output files (in this case one txt file per root file) and sum the output

python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleEvCounter.py -t EventCount_2016Full -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt -p 0 -q espresso -s 100 -d NTupleEvCounter_EventCount_Summer2016_Full_2016_05_12_23_20_07 --merge

A message on the screen will tell you the output file location. e.g.:

Output is in  NTupleInfo2016Full_EventCountLog.csv 

How to create the analysis trees.

Similar to what we did above, a different python file will be used to analyze the ntuples (using pyROOT), and make lightweight trees. There will be one file for every NTuple file, but the tree will use "hadd" to merge these files according to the "Group" in the CSV file. At the end of it all, we will have just a few root files like "SingleMuData.root" "ZJets.root" etc - which contain everything we need for each data or MC sample.

These files can take up many GB. I recommend using an AFS work space like "/afs/cern.ch/work/u/username/..." and bumping up your quota to 100GB at the afs workspaces page

Running is similar to the previous step. This is the command:

Example for LQ:

 python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016QuickTest -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt  -p 0 -q tomorrow -s 50  --merge

Example for HH:

 python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016FullHH_customNano.csv -py NTupleAnalyzerHH_nanoAOD.py -t Full2016 -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt  -p 1 -q tomorrow -s 50  --merge

To detail the arguments:

-i CSV File The CSV file you wish to run on.
-t Tag name Results will output in a directory specified by this tag-name
-j JSON file Default JSON file. Over-ridden by column in CSV file. Always put something here.
-p 0 Whether or not to store branches of pdf weights.
-q tomorrow The batch queue to use.
-s 40 Number of files to analyze per batch job (the split number)
-y 2016 Year of data and MC. Choices are 2016, 2017, 2018
--FileRefresh The code will automatically make a list of good ntuple files and store it locally for future use, so it doesn't have to re-read directories all the time. This demands to re-read directories.
--merge Compile the results. For event counting, this lists total aggregated counts. For tree-making, it hadds the output into single root files (SummaryFiles)
Other studies might require slight modifications to what you want to run. A couple are built in via the tagname option. These include:

QCDNonIso Removed muon isolation so that QCD studies can be performed
EMuSwitch Replaces one muon with a HEEP electron, for ttbar studies in an enriched e-mu control region
QuickTest Turn off systematics (much faster, but does not contain all information needed for final analysis)
Examples of these runs are as follows. Typically you might not need the full CSV files for these, so you could make new ones with just the datasets you need

python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016EMuSwitch -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt  -p 1 -q tomorrow -s 40 -y 2016 --merge

python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016QCDNonIsoQuickTest -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt  -p 1 -q tomorrow -s 40  -y 2016 --merge

python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016QuickTest -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt  -p 1 -q tomorrow -s 40  -y 2016 --merge

What if it takes too long?

It is also important to note that these jobs can take several hours or a day, depending on what you run. The AnalyzerMakerFastLocalHTCondor.py script stays on and waits for the batch jobs to finish. If a file is missing, it relaunches it. This is all transparent to the user, but it can stay on the screen for quite a while.

There are two tools to help out with this.

  • The first (and best) way is to to leave and come back later. When you see that the AnalyzerMakerFastLocalHTCondor script has submitted its jobs and is waiting for them to be done, you can just ctrl+c and come back later. The TagName option has made a working directory with a time-and-date stamp, and you can specify to pick up where you left off. So go home, come back, and re-issue the the command specifying the working directory with the "-d" command, like:
python AnalyzerMakerFastLocalHTCondor.py -i NTupleInfo2016Full_customNano.csv -py NTupleAnalyzer_nanoAOD.py -t Full2016 -j Cert_271036-284044_13TeV_23Sep2016ReReco_Collisions16_JSON.txt  -p 1 -q tomorrow -s 40  -y 2016 --merge -d NTupleAnalyzer_Full2016_05_22_23_36_59

  • Another helpful tool is "screen" (less preferred than leaving and coming back later). If you don't want to wait around for jobs to finish, use a "screen" on lxplus. Note the lxplus machine you are on (e.g. lxplus730), and open a screen by typing "screen". Launch the jobs with AnalyzerMakerFastLocalHTCondor.py as described above. Type ctrl+A+D to exit. You can then log out, go home, etc, and log back into to lxplus430, and type "screen -r" to return to your screen.

Part III: Running the result producer

From the flat trees, analysis and results are performed. This includes many steps, almost all of which are done in a single result producer python script . In this script, you will find the following:

  • The definition of the event weight, according to the weight variable in the trees, HLT reweighting MC with a single-mu trigger, and the integrated luminosity.
  • Definition of a preselection - a selection which demands the LQ decay product particles, but doesn't make strong cuts. Normalization of the Z, W, and TTbar backgrounds will be done at preselection.
  • Functions that make histograms at preselection and final selection.
  • Functions that can optimize signal-background separation with cuts based on S/sqrt(S+B) and store the result to a txt file.
  • Functions that can make the event counts tables based on the above optimizations.
  • Functions that can make cards compatible with the Higgs limit tools, to get the real "final" results.
  • Unused functions which can perform analysis with TMVA (interesting, but not needed for our normal results). [deprecated]

Basically, everything you need to run is in the main() function of the script. This is just a few background studies, plot production, and table production (tables get parse to higgs limit tool cards). This is a brief overview of the commonly used parts of the code.

To run the code, execute:

python LQResultProducer.py -y [year] -b [b-tag category]

where the year can be 2016, 2017, 2018, or combined, and the b-tag category can be 0, 1, or 2 (1 for EXO-21-019).

Setup and directory

At the beginning of the main function, you will see the following lines, followed by the definition of the variable binnings for different histograms. The results in this example (running with the option -y 2016) would be output to a directory called "Results_Unblinded_2016". It will attempt to make this directory. If it already exists, that is fine. The *OptCutFile files that detail the final selection cuts must be present in a subdirectory (e.g., Results_Unblinded_2016/Final_selection).

def main():

   #######################################################################################
        ######  The output directories, and the files that define final selection cuts  #######
   #######################################################################################

   # Please retain the "scriptflag" comment. Some python scripts are available which search
   # for this, and make use of it. e.g. For systematic variations, we can in batch instead
   # of running serially, which speeds things up.

   version_name = year+"_Unblinded" # scriptflag
   #version_name = 'Testing_noQCD_14nov' # use sf tag above if this is the real folder
   os.system('mkdir Results_'+version_name) 

   global MuMuOptCutDir
   MuMuOptCutDir = 'Final_selection' # scriptflag
   #os.system('mkdir Results_'+version_name+'/'+MuMuOptCutDir) # scriptflag
   MuMuOptCutFile = 'Results_'+version_name+'/'+MuMuOptCutDir+'/Opt_LQuujj_Cuts.txt' # scriptflag
   #MuMuOptCutFile = 'Results_'+version_name+'/OptLQ_uujjCuts_Smoothed_pol2cutoff.txt'
   MuNuOptCutFile = 'Results_'+version_name+'/OptLQ_uvjjCuts_Smoothed_pol2cutoff.txt' # scriptflag

 

PDF Uncertainties

The PDF uncertainties are time consuming. By enabling the following lines (turning False to True), you will run the PDF studies. This will print out some lists of uncertainties on the screen. I recommend not to do this normally. You can paste the output of this routine into the "HARD CODED RESULTS USED IN ANALYSIS" section.


   # ====================================================================================================================================================== #
   # These are PDF uncertainty studies. Ignore these for now!      ### STILL FIXING THIS PORTION
   # ====================================================================================================================================================== #

   if (False):
      PDF4LHCUncStudy(MuMuOptCutFile,MuNuOptCutFile,version_name)
      PDF4LHCPlotsFromResultDict('Results_'+version_name+'/PDFVariationsDictionary.json',version_name)

QCD Studies

QCD is negligible, and we just conduct a few studies to validate that. These studies make plots and produce a few numbers to cite in the analysis note. You can enable the QCD studies by turning on this part of the code:

   # ====================================================================================================================================================== #
   # This is the QCD study. It will make a couple plots, and test QCD contamination at final selection. We consider QCD negligible, but is good to test this once!
   # ====================================================================================================================================================== #
   if (False):
      qcdselectionmumu = '((Pt_muon1>45)*(Pt_muon2>45)*(Pt_jet1>125)*(Pt_jet2>45)*(St_uujj>300)*(DR_muon1muon2>0.3))'
      qcdselectionmunu = '((Pt_muon1>45)*(Pt_muon2<45.0)*(Pt_jet1>125)*(Pt_jet2>45)*(Pt_ele1<45.0)*(St_uvjj>300)*(DPhi_muon1met>0.8)*(DPhi_jet1met>0.5))'

      QCDStudy(qcdselectionmumu,qcdselectionmunu,MuMuOptCutFile,MuNuOptCutFile,NormalWeightMuMu,NormalWeightMuNu,version_name)

TTbar E-Mu data-driven studies

The data-driven TTbar estimate relies on a calculation of the mu-mu to e-mu efficiency, and we also provide some characteristic plots and numbers in the analysis note. To create this material, you can turn on the following part of the main function. Once again, there is no need to run this all the time. When the study is stable, you just need to store the reweighting factor in the "HARD CODED RESULTS" section of the script - e.g. " emu_id_eff = 0.5716 "


   # ====================================================================================================================================================== #
   # The ttbar e-mu data-driven study. You only need to do this once, check the validation output, and use the scale-factor as global variable "emu_id_eff"
   # The scale-factor should be near 0.5
   # ====================================================================================================================================================== #
   if (False):
      # TTBar STudy
      [Rtt_uujj,Rtt_uujj_err] = GetEMuScaleFactors( NormalWeightEMu+'*'+preselectionemu, EMuDirectory)
      Rw_uvjj,Rz_uujj = [1.0,1.0]

      # # PreSelection Plots
      MakeBasicPlotEMu("St_uujj","S_{T}^{e #mu j j} [GeV]",stbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("Pt_miss","E_{T}^{miss} [GeV]",metbinning2,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("M_uu","M^{e #mu} [GeV]",bosonbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("M_uujj2","M^{e/#mu j}_{2} [GeV]",lqbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("DR_muon1muon2","#DeltaR(#mu,e})",drbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)

      MakeBasicPlotEMu("St_uujj","S_{T}^{e #mu j j} [GeV]",stbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("Pt_miss","E_{T}^{miss} [GeV]",metbinning2,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("M_uu","M^{e #mu} [GeV]",bosonbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("M_uujj2","M^{e/#mu j}_{2} [GeV]",lqbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("DR_muon1muon2","#DeltaR(#mu,e})",drbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)

Making analysis-note style plots

The analysis-note style plots have a ratio subplot, and no CMS stamp. Each plot is a single function-call of MakeBasicPlot(args). There is also an option to make a basic results table. You can turn all this on in the following part of the code:


   # ====================================================================================================================================================== #
   # This is a basic plotting routine to make Analysis-Note style plots with ratio plots. 
   # ====================================================================================================================================================== #
   if (False):

      # Get Scale Factors
      [[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
      [[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')

      # Optionally, you can make an event-count table for each selection. Useful if testing a new optimiation
      # We will do this later wtih full systematics for our set of stable cuts. 
      if False:
         QuickTableTTDD(MuMuOptCutFile, preselectionmumu+"*(M_uu>100)",NormalWeightMuMu,Rz_uujj, Rw_uvjj,Rtt_uujj,0)
         QuickTable(MuNuOptCutFile, preselectionmunu,NormalWeightMuNu,Rz_uujj, Rw_uvjj,Rtt_uvjj,0)

      # Here are a few plots which are zoomed-in on control regions. 
      MakeBasicPlot("M_uu","M^{#mu #mu} [GeV]",bosonzoombinning_uujj_Z,preselectionmumu,NormalWeightMuMu,NormalDirectory,'controlzoom_ZRegion','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      ...etc...

      # UUJJ plots at preselection, Note that putting 'TTBarDataDriven' in the name turns on the use of data-driven ttbar e-mu sample in place of MC
      MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standard_TTBarDataDriven','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      ... etc ...

Running the PAS-style plots

PAS-style plots have no ratio subplots, and have a CMS stamp. You can enable this portion of the main function to run the PAS-style plots. There is also a loop over LQ mass values that you can easily modify for different final-selection plots. Using the "tagfree" file tag will remove the CMS stamp.

   # ====================================================================================================================================================== #
   # This is a plotting routine for PAS-style publication-quality plots
   # ====================================================================================================================================================== #

   if False:

      # Get Scale Factors
      [[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
      [[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')

      ...etc...

      # The two flags are for regular plots, and tagfree plots (plots that don't say CMS Preliminary - for notes or thesis)
      for flag in ['','tagfree']:
         # Preselection plots in the UUJJ channel in the PAS style (no subplot)
         MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
         ... etc ...

Running the full event yields (with all systematic variations) and making higgs limit-tool cards

There are two ways to do this. It is easy to just let this run, but in practice it has been time consuming (many systematic variations) and can take more than a day. The end-goal of this all is that it produces several txt files in the result directories containing the event yields, and then combines all that information into higgs limit tool cards.

The long way to do this is to just let this part of the code run:

   # ====================================================================================================================================================== #
   # This runs a "FullAnalysis" - i.e. produces tables with full systematis included. 
   # ====================================================================================================================================================== #

   # You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
   # Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
   # When done, proceed to the next step to make higgs limit cards
   if (True):
      FullAnalysis(MuMuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuMu,'TTBarDataDriven')  # scriptflag
      FullAnalysis(MuNuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuNu,'normal')  # scriptflag

   if (True):
      uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
      uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')

      uujjcards = ParseFinalCards(uujjcardfiles)
      uvjjcards = ParseFinalCards(uvjjcardfiles)
      finalcards = FixFinalCards([uujjcards,uvjjcards])

      print 'Final Cards Available in ',finalcards

The short way circumvents the running of the FullAnalysis(...) functions, by making multiple copies of the LQResultProducer - one for each systematic variation - and running them in batch in parallel. This is much faster, but you have to rely on another script to parse it all out. When the script finishes, you just have to run the second part of the above (with the ParseFinalCards and FixFinalCards commands). To use this method, do the following, and wait for the batch-jobs to finish.

(1) Run

 
python SysBatcher.py --launch

(2) Wait for those batch jobs to finish, and then use just the second part of the above code

   if (True):
      uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
      uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')

      uujjcards = ParseFinalCards(uujjcardfiles)
      uvjjcards = ParseFinalCards(uvjjcardfiles)
      finalcards = FixFinalCards([uujjcards,uvjjcards])

      print 'Final Cards Available in ',finalcards

What to do if systematics need to be evaluated at a different selection level

If systematic uncertainties are evaluated at a different selection than the final analysis, say the enhanced selection level, then you can just run the above twice, with a few minor changes:

(1) You will need a new cut file (Opt_LQuujj_Cuts.txt) with the different selection you want, and it should be kept under a different subfolder in the results directory with the name Enhanced_selection. So make this new folder:

mkdir [Results_directory_name]/Enhanced_selection

and make your cut file Opt_LQuujj_Cuts.txt and place it in the new subdirectory.

(2) Then modify LQResultProducer.py to point to the new cut file, so

MuMuOptCutDir = 'Final_selection' # scriptflag

needs to be changed to, say:

MuMuOptCutDir = 'Enhanced_selection' # scriptflag

(3) Then just run the SysBatcher.py again to get a new datacard with the new selection (it will be stored in the new folder).

(4) Change the names of the datacards to FinalCardsFinalSys_[year].txt and EnhancedCardsLQ_[year].txt, to distinguish them in the next step.

(5) Last, you will want to combine the datacards to take the systematics from EnhancedCardsLQ_[year].txt, and the events/stats from FinalCardsFinalSys_[year].txt. Simply run:

python CombineDatacards.py -d [Results_directory_name]

And a new datacard containing the enhanced selection systematics and the final selection event yields and stats will be created:

[Results_directory_name]/Final_selection/FinalCardsLQ_[year].txt

Displaced Lepton reinterpretation

sed 's/LQ/BLCTau1/g' LQResultProducer.py > BLCTau1ResultProducer.py
sed 's/BLCTau1/BLCTau10/g' BLCTau1ResultProducer.py > BLCTau10ResultProducer.py
sed 's/BLCTau1/BLCTau100/g' BLCTau1ResultProducer.py > BLCTau100ResultProducer.py
sed 's/BLCTau1/BLCTau1000/g' BLCTau1ResultProducer.py > BLCTau1000ResultProducer.py

cat SysBatcher.py | sed 's/LQ/BLCTau1/g' | sed "s/,'uvjj'//g"  > BLCTau1SysBatcher.py 
cat BLCTau1SysBatcher.py | sed 's/BLCTau1/BLCTau10/g' | sed "s/,'uvjj'//g"  > BLCTau10SysBatcher.py 
cat BLCTau1SysBatcher.py | sed 's/BLCTau1/BLCTau100/g' | sed "s/,'uvjj'//g"  > BLCTau100SysBatcher.py 
cat BLCTau1SysBatcher.py | sed 's/BLCTau1/BLCTau1000/g' | sed "s/,'uvjj'//g"  > BLCTau1000SysBatcher.py 

Part IV: Making limit plots

The limit-setting code in now in a subdirectory of the main github repository. It has some standalone instructions, but since the LQResultProducer already produces a compatible FinalCardsLQ.txt, here are the short instructions.

(1) Checkout the higgs limit tool code and appropriate CMSSW version. From: HiggsAnalysis CombinedLimit github

Setting up the environment (once)

setenv SCRAM_ARCH slc7_amd64_gcc700
cmsrel CMSSW_10_2_13
cd CMSSW_10_2_13/src
cmsenv
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit

Update to a reccomended tag - currently the reccomended tag is v8.1.0

cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit
git fetch origin
git checkout v8.1.0
scramv1 b clean; scramv1 b # always make a clean build, as scram doesn't always see updates to src/LinkDef.h

  • when running with the HybridNew calculator, you can safely ignore the following warning from RooFit
    WARNING:Eval -- RooStatsUtils::MakeNuisancePdf - no constraints found on nuisance parameters in the input model

(2) Put your FinalCardsLQ.txt file in the same directory as the RunBasicStatsCLs.py script.

(3) Run the single-channel limits (limit results will print to screen - save them!).

 python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only

(4) Run the combined limit. It will have to batch out some jobs, so it is a two step process:

  • Run the script which creates and launches the jobs (one job per beta value)
    •  python AsympBatcher.py --do_Combo -q 8nh -n 1 -c LimitTest --Asymptotic_Only 
    • Job output will be in the "BatcherResults" folder.
  • When jobs are complete, merge them. Results will print to the screen.
    •  python AsympBatcherParser.py  

5) Take the results from (3) and (4) and put them into the plot macros. The plot macros compile and run. An example of this wold be:

{

gROOT->ProcessLine(".L ComboPlotLQ2.C++");
gROOT->ProcessLine("makePlots()");

gROOT->Reset();

gROOT->ProcessLine(".L BR_Sigma_MuMu_vsMass.C++");
gROOT->ProcessLine("makePlotsBO()");

gROOT->Reset();

gROOT->ProcessLine(".L BR_Sigma_MuNu_vsMass.C++");
gROOT->ProcessLine("makePlotsBH()");

gROOT->ProcessLine(".q");

}

Part V: Making publication-quality tables

The event yield and systematics tables can be automatically created based on the FinalCardsLQ.txt file (the higgs limit-card file) which has already been created. It simply needs to be parsed into a LaTeX table.

To make tables of event yields with statistical errors, tables with the systematic uncertainties, and tables with the range of systematics across signal masses, run:

python MakeTablesFromDatacards.py -y [2016, 2017, 2018, or combined] -f your/path/to/FinalCardsLQ.txt -e your/path/to/EnhancedCardsLQ.txt

This script can handle both single-year and combined-year datacards. The datacard passed with '-f' will give the event yields and statistical errors, while the datacard passed with '-e' will give the systematic uncertainties. Inside MakeTablesFromDatacards.py, each table is printed with the following functions:

YieldsTable(): returns a LaTeX table with the event yields, statistical uncertainties, and systematic uncertainties (for the totals) for a single year, or the combination (if combined datacard).

SystematicsByYearTable(): returns a LaTeX table with each contribution to the systematic uncertainties for a single mass point, separately for all three years (if combined datacard) or for a single year.

SysRangesByYearTable(): returns a LaTeX table with the high and low ranges of systematic uncertainties for each contribution, separately for all three years (if combined datacard) or for a single year.

Obsolete

These scripts have been replaced by MakeTablesFromDatacards.py.

To get the content of the event yields table, do :

python CleanTableParse.py your/path/to/FinalCardsLQ.txt

To get tables of systematic uncertainties, do:

python SysTableParse.py your/path/to/FinalCardsLQ.txt

This last command also prints out the total percent uncertainties for each channel/mass. This list is used in the LQResultProducer.py in the "HARD CODED RESULTS" section to define the uncertainty band, used in the PAS-style plots.

Part VI: Making PAS-quality plots.

The output of the table production (Part 4) will give you lists of systematic uncertainties percent values at each LQ mass. Use this in the "HARD CODED RESULTS" part of the LQ result producer to update the lines:

# These are the total background uncertainties. They are used just to make some error bands on plots. 
totunc_uujj = [3.46, 2.55, 2.98, 3.67, 5.44, 5.73, 6.21, 6.47, 9.47, 11.16, 14.79, 19.75, 31.11, 35.41, 8.12, 8.12, 8.12, 8.12, 8.12]
totunc_uvjj = [13.37, 12.37, 13.35, 12.97, 13.62, 15.94, 16.23, 20.67, 23.94, 30.29, 35.3, 52.94, 69.57, 46.09, 42.3, 42.3, 42.3, 42.3, 42.3]

Then you can run the standard PAS plots in the LQResultProducer, i.e. enable this part of the main function:


   # ====================================================================================================================================================== #
   # This is a plotting routine for PAS-style publication-quality plots
   # ====================================================================================================================================================== #

   if False:

      # Get Scale Factors
      [[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
      [[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')

      ...

      # The two flags are for regular plots, and tagfree plots (plots that don't say CMS Preliminary - for notes or thesis)
      for flag in ['','tagfree']:

         # Preselection plots in the UUJJ channel in the PAS style (no subplot)
         MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
         MakeBasicPlot("Pt_jet2","p_{T}(jet_{2}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
               ...

Making results for Vector LQ

The goal here will be to port all this machinery to making Vector LQ results with as little effort as possible. To do so, I've made some tweaks to the LQResultProducer, such that many of the output file names will contain the letters "LQ". From now on, "LQ" will indicate the normal (scalar) analysis, and the different vector scenarios will be denoted by "AM", "MC", "MM", and "YM". Correspondingly, I've organized the tree-making facility to name all the new signal files like this. e.g., see the vector LQ csv file. After making trees with the vector LQ csv file, and putting those new files into the same directory as the scalar files, we can proceed with some modifications to the code to run on vector LQ.

Fixing up the cut cards

In the results directory, there are typically a couple txt files which store the optimized cuts. They are called something like "Opt_uvjjCuts_Smoothed_pol2cutoff.txt", and store text that looks like this:

opt_LQuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_LQuujj350 = ((M_uu>115)*(St_uujj>460)*(M_uujj2>115))
opt_LQuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
.....

In contrast our vector LQ has mass points every 100 GeV, and goes up to 1800. First, let's fix the file names of the cut cards for scalar:

mv Opt_uujjCuts_Smoothed_pol2cutoff.txt OptLQ_uujjCuts_Smoothed_pol2cutoff.txt
mv Opt_uvjjCuts_Smoothed_pol2cutoff.txt OptLQ_uvjjCuts_Smoothed_pol2cutoff.txt

Next, copy the content of these files to new vector YM files, changing the "LQ" to "YM", using only the values every 100 GeV, and extending to 1800 (holding the last cuts fixed. For example:

Contents of OptYM_uujjCuts_Smoothed_pol2cutoff.txt:

opt_YMuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_YMuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
opt_YMuujj500 = ((M_uu>150)*(St_uujj>685)*(M_uujj2>155))
opt_YMuujj600 = ((M_uu>175)*(St_uujj>820)*(M_uujj2>210))
opt_YMuujj700 = ((M_uu>195)*(St_uujj>935)*(M_uujj2>295))
opt_YMuujj800 = ((M_uu>215)*(St_uujj>1040)*(M_uujj2>400))
opt_YMuujj900 = ((M_uu>230)*(St_uujj>1135)*(M_uujj2>535))
opt_YMuujj1000 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1100 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1200 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1300 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1400 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1500 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1600 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1700 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1800 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))

Contents of OptYM_uvjjCuts_Smoothed_pol2cutoff.txt:

opt_YMuvjj300 = ((MT_uv>155)*(St_uvjj>455)*(M_uvjj>125))
opt_YMuvjj400 = ((MT_uv>205)*(St_uvjj>625)*(M_uvjj>175))
opt_YMuvjj500 = ((MT_uv>245)*(St_uvjj>800)*(M_uvjj>225))
opt_YMuvjj600 = ((MT_uv>275)*(St_uvjj>980)*(M_uvjj>280))
opt_YMuvjj700 = ((MT_uv>300)*(St_uvjj>1160)*(M_uvjj>330))
opt_YMuvjj800 = ((MT_uv>315)*(St_uvjj>1345)*(M_uvjj>380))
opt_YMuvjj900 = ((MT_uv>320)*(St_uvjj>1530)*(M_uvjj>435))
opt_YMuvjj1000 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1100 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1200 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1300 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1400 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1500 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1600 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1700 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1800 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))

These cuts are all the same for all vector LQ samples, so just copy it to appropriately named/structured files for "AM", "MC", and "MM".

sed 's/YM/MC/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptMC_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MC/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptMC_uvjjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/AM/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptAM_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/AM/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptAM_uvjjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MM/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptMM_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MM/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptMM_uvjjCuts_Smoothed_pol2cutoff.txt

Running on the vector samples

The LQResultProducer has been written such that we can just replace the "LQ" with any of the vector names and run. Since we are basically just interested in getting Higgs Limit cards for each of the vector samples, we can do the same with the SysBatcher.

First, clone the LQResultProducer and the SysBatcher for vector purposes:

sed 's/LQ/YM/g' LQResultProducer.py > YMResultProducer.py 
sed 's/LQ/YM/g' SysBatcher.py > YMSysBatcher.py 
sed 's/LQ/MM/g' LQResultProducer.py > MMResultProducer.py 
sed 's/LQ/MM/g' SysBatcher.py > MMSysBatcher.py 
sed 's/LQ/MC/g' LQResultProducer.py > MCResultProducer.py 
sed 's/LQ/MC/g' SysBatcher.py > MCSysBatcher.py 
sed 's/LQ/AM/g' LQResultProducer.py > AMResultProducer.py 
sed 's/LQ/AM/g' SysBatcher.py > AMSysBatcher.py 

Next, just launch the batchers and wait. I recommend not launching them all at once - wait an hour or two between each. Each of these is opening the root files in an afs workspace - so it is best to stagger it a bit.

python YMSysBatcher.py --launch
python MMSysBatcher.py --launch
python MCSysBatcher.py --launch
python AMSysBatcher.py --launch

Then, you can produce the higgs limit cards (the "FinalCards" files in the same as the scalar samples: https://twiki.cern.ch/twiki/bin/view/Main/LQPairSecondGen13TeVOutline#Running_the_full_event_yields_wi

You will have to do that once, for each of YMResultProducer.py, MCResultProducer.py, MMResultProducer.py, and AMResultProducer.py. It will yield four card files: FinalCardsYM.txt, FinalCardsMC.txt, FinalCardsMM.txt and FinalCardsAM.txt.

Getting vector LQ limits

First, we have to fix one small problem. Because the Higgs limit tools presents the limit as a ratio of the assumed cross section (r), and reports it to 5 digits, the very large size of the LQ signal means we have to increase the r precision. I find that the building works best on lxplus5, but running the combine tool works fine on lxplus6 thereafter.

cd CMSSW_6_1_1/src
sed -i 's#6.4#9.9#g' ./HiggsAnalysis/CombinedLimit/src/Asymptotic.cc
scram b -j 16

New flags have been built into the limit tool to deal with the vector scenario. These flags switch on the use of the vector LQ card file (e.g. FinalCardsYM.txt), and they can be used as follows:

python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorYM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorAM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorMM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorMC

Making results for RPV Susy In Progress

In this section we will follow the same methods as the vector LQ samples to get limits on the RPV susy (232) samples. These samples have mass values ranging from 200 to 1000 in 50GeV increments. The Scalar LQ samples were denoted "LQ" and the Vector samples as "YM", "MC", "MM", and "AM". For Susy we will follow a similar convention and label everything "RV"

Creating the cut card

In this example, I will use the same cuts as the regular LQ signal. Dave is further testing the optimal cuts and can modify this

Cuts should use m_stop-100 GeV LQ optimization point.

Contents of OptRV_uujjCuts_Smoothed_pol2cutoff.txt:

opt_RVuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_RVuujj350 = ((M_uu>115)*(St_uujj>460)*(M_uujj2>115))
opt_RVuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
opt_RVuujj450 = ((M_uu>140)*(St_uujj>615)*(M_uujj2>135))
opt_RVuujj500 = ((M_uu>150)*(St_uujj>685)*(M_uujj2>155))
opt_RVuujj550 = ((M_uu>165)*(St_uujj>755)*(M_uujj2>180))
opt_RVuujj600 = ((M_uu>175)*(St_uujj>820)*(M_uujj2>210))
opt_RVuujj650 = ((M_uu>185)*(St_uujj>880)*(M_uujj2>250))
opt_RVuujj700 = ((M_uu>195)*(St_uujj>935)*(M_uujj2>295))
opt_RVuujj750 = ((M_uu>205)*(St_uujj>990)*(M_uujj2>345))
opt_RVuujj800 = ((M_uu>215)*(St_uujj>1040)*(M_uujj2>400))
opt_RVuujj850 = ((M_uu>220)*(St_uujj>1090)*(M_uujj2>465))
opt_RVuujj900 = ((M_uu>230)*(St_uujj>1135)*(M_uujj2>535))
opt_RVuujj950 = ((M_uu>235)*(St_uujj>1175)*(M_uujj2>610))
opt_RVuujj1000 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))

Run the event yields and produce higgs limit cards

Just as with the vector LQ, we can clone the LQResultProducer and the SysBatcher for the Susy scenario. Since the RPV susy is only in the mu-mu-j-j channel, we make one slight tweak to remove the uvjj running

sed 's/LQ/RV/g' LQResultProducer.py > RVResultProducer.py 
cat SysBatcher.py | sed 's/LQ/RV/g' | sed "s/,'uvjj'//g"  > RVSysBatcher.py 

Next, just launch the batcher and wait for the jobs to finish.

python RVSysBatcher.py --launch

Then, you can produce the higgs limit cards (the "FinalCards" files in the same as the scalar samples: https://twiki.cern.ch/twiki/bin/view/Main/LQPairSecondGen13TeVOutline#Running_the_full_event_yields_wi ) To do so, modify the "FullAnalysis" portion of the RVResultProducer.py as such:

   # ====================================================================================================================================================== #
   # This runs a "FullAnalysis" - i.e. produces tables with full systematis included. 
   # ====================================================================================================================================================== #

   # You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
   # Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
   # When done, proceed to the next step to make higgs limit cards
   if (True):
      FullAnalysis(MuMuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuMu,'TTBarDataDriven')  # scriptflag
   if (True):
      uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
      uujjcards = ParseFinalCards(uujjcardfiles)
      finalcards = FixFinalCards([uujjcards])
      print 'Final Cards Available in ',finalcards

And then just run it, e.g.

python RVResultProducer.py

You should now have a "FinalCardsRV.txt" file in your results directory.

Getting RPV Susy limits

Proceed exactly as in scalar and vector LQ case, using the flag "--susyRV" during limit setting. To be precise, if you've never set up the Higgs limit tool in the LimitSetting directory before, just put the FinalCardsRV.txt in the LimitSetting directory, and do this:

cd LimitSetting
export SCRAM_ARCH=slc6_amd64_gcc491
cmsrel CMSSW_7_4_7
cd CMSSW_7_4_7/src 
cmsenv
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit

Update to a reccomended tag - currently the reccomended tag is v6.2.1

cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit
git fetch origin
git checkout v6.2.1
scramv1 b clean; scramv1 b # always make a clean build, as scram doesn't always see updates to src/LinkDef.h

sed -i 's#6.4#9.9#g' ./HiggsAnalysis/CombinedLimit/src/Asymptotic.cc
scram b -j 16
cd ../../

python RunStatsBasicCLs.py --do_BetaOne -c LimitTest --Asymptotic_Only --susyRV

Getting RPV Limit plots

A plot macro has been drafted at LimitSetting/PlotMacros/BR_Sigma_MuMu_vsMassSUSYRPV.C

To run it, simply do:

root -b RunSusyRPVPlotsInRoot

An example output is at this link

diHiggs

This section contains HH->bbZZ->bblljj specific instructions. The general description mentioned in Part III: Running the result producer are still applied. So, it is useful to read through that description.

Setup and directory

Using the same procedure in Part II. If you did this in Part II, the HHResultProducer.py should be ready.

git clone git@github.com:DavidMorse/LQ2Analysis.git LQ2Analysis13TeV
cd LQ2Analysis13TeV
cmsrel CMSSW_10_2_12
cd CMSSW_10_2_12/src
cmsenv
cd -

Also make sure the files "OptCutFile*" are ready. See details in Setup and directory. The workable version of these files are in this directory.

Result Producer

HHResultProducer.py is now in the github repository. The directories have been updated to automatically get the trees from the EOS directories described below. To run simply do (see details in the next sections):

python HHResultProducer.py

It takes a while to run (particularly when the files need to be accessed on EOS), don't be alarmed.

Procedures to run the Result Producer

Relevant Switches

The follwing are main switches allowing you to choose the channel, signal hypotheses, etc. The comments in the code are self-explaned.

analysisChannel = 'muon'
analysisSignal = 'HHres'
recalRttRz = True
doindiv = True   
use_bdtSp0 = False

Getting fbd Rt Rz (QCD multijet, ttbar, and ZJets estimations)

This is done by enabling the following lines (turning False to True):

   if False:
      [[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err],[Fbd_uujj,Fbd_uujj_err]] = GetNormalizationScaleFactorsAndFbd(NormalWeight+'*'+preselection, NormalDirectory, dyControlRegion, ttControlRegion,0)
      exit()

Then hard-code the outputs at these lines

# QCD data-driven scale factor
useDataDrivenQCD = True
fbd_Muon     = [1.406, 0.011] # Apr 05, 2019
fbd_Electron = [1.079, 0.009] # Apr 05, 2019
# tt, z SF calculated using Data with NEW method (Apr 05, 2019)
Rz_data_muon  = [1.139, 0.006]
Rtt_data_muon = [0.906, 0.013]
Rz_data_electron  = [1.241, 0.008]
Rtt_data_electron = [0.968, 0.018]

Getting data-MC comparison plots (analysis-note style plots)

The general idea is the same as in Making analysis-note style plots. Here you have to enable these lines:

   # ====================================================================================================================================================== #
   # This is a basic plotting routine to make Analysis Note style plots with ratio plots. AN Analysis-Note
   # ====================================================================================================================================================== #
   if True :
        ...
        ...
        MakeBasicPlot("cosTheta_hbb_"+ell+ell,"cos(#Theta) (H->bb) "+latexEll+latexEll,costhetastarbinning,preselection,NormalWeight,NormalDirectory,'standard',ell+ell+'jj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,300)
        ...

There are two selections used in HH analysis, "preselection" and "final selection". The distributions with final selection are generated by the code starting from the lines

      #if analysisChannel=='electron' : exit()
      # fixme turning off final selection
      # Full Selection Plots

Running the full event yields (with all systematic variations) for HH

The procedure is the same as in LQ Running the full event yields.... I copy it here with some modifications.

For HH annalysis, we will use the BDT distribution for extracting limits. So, we need the BDT distribution and the systematic variations of it. The end-goal of this all is that it produces (i) several txt files in the result directories containing the event yields (ii) several root files each contains one systematic variation, and then combines all that information into higgs limit tool cards.

The idea is to circumvent the running of the FullAnalysis(...) functions, by making multiple copies of the LQResultProducer - one for each systematic variation - and running them in batch in parallel. This is much faster, but you have to rely on another script to parse it all out. When the script finishes, you just have to run the final part of the code (with the ParseFinalCards and FixFinalCards commands). To use this method, do the following:

(1) comment out the second and third part   (Set the first part to True and leave the 2nd and 3rd part as Fasle)

   # ====================================================================================================================================================== #
   # This runs a "FullAnalysis" - i.e. produces tables with full systematis included. 
   # ====================================================================================================================================================== #

   # You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
   # Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
   # When done, proceed to the next step to make higgs limit cards
   if True :
      #FullAnalysis(MuonOptCutFile, preselection,preselectionmunu,NormalDirectory,NormalWeight,'TTBarDataDriven')  # scriptflag
      #FullAnalysis(MuNuOptCutFile, preselection,preselectionmunu,NormalDirectory,NormalWeightMuNu,'normal')  # scriptflag
      
      finalSelectionMuon = preselection.replace(bTagPresel,bTagFinalsel)
      finalWeightMuon = NormalWeight.replace(bTagPreselSF,bTagFinalSF)
      #finalSelectionMuon = preselection.replace(bTagsel1loose,bTagsel2loose)
      #finalWeightMuon = NormalWeight.replace(bTag1SF,bTag2SF)
      #FullAnalysis(MuonOptCutFile, preselection,preselection,preselectionmunu,NormalDirectory,NormalWeight,NormalWeight'normal')  # scriptflag #preselection (1 loose bTag)
      FullAnalysis(MuonOptCutFile, preselection,finalSelectionMuon,preselectionmunu,NormalDirectory,NormalWeight,finalWeightMuon,'normal')  # scriptflag #final selection

   if False :
      print ' doing RunPDFUncertainty test '
      finalSelectionMuon = preselection.replace(bTagPresel,bTagFinalsel)
      finalWeightMuon = NormalWeight.replace(bTagPreselSF,bTagFinalSF)
      #FullAnalysis(MuonOptCutFile, preselection,preselection,preselectionmunu,NormalDirectory,NormalWeight,NormalWeight'normal')  # scriptflag #preselection (1 loose bTag)
      RunPDFUncertainty(MuonOptCutFile, preselection,finalSelectionMuon,preselectionmunu,NormalDirectory,NormalWeight,finalWeightMuon)  # scriptflag #final selection


   if False :
      uujjcardfiles = MuonOptCutFile.replace('.txt','_systable*.txt')
      #uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')

      uujjcards = ParseFinalCards(uujjcardfiles)
      #uvjjcards = ParseFinalCards(uvjjcardfiles)#fixme using uujj for now for speed
      #finalcards = FixFinalCards([uujjcards,uvjjcards])
      finalcards = FixFinalCards([uujjcards])

      print 'Final Cards Available in',finalcards

Run SysBatcherHH.py

 
python SysBatcherHH.py --launch

(2) turn off the first part and turn on the second part

 
 
   if True :
      print ' doing RunPDFUncertainty test '
      finalSelectionMuon = preselection.replace(bTagPresel,bTagFinalsel)
      finalWeightMuon = NormalWeight.replace(bTagPreselSF,bTagFinalSF)
      #FullAnalysis(MuonOptCutFile, preselection,preselection,preselectionmunu,NormalDirectory,NormalWeight,NormalWeight'normal')  # scriptflag #preselection (1 loose bTag)
      RunPDFUncertainty(MuonOptCutFile, preselection,finalSelectionMuon,preselectionmunu,NormalDirectory,NormalWeight,finalWeightMuon)  # scriptflag #final selection

Then run another script

 
python SysBatcherHHPDF.py --launch

(3) Wait for those batch jobs to finish. Use the script combine_SystHist.py to check the results and resubmit jobs if needed. YYY_MM_DD_TTTTTT are the directories created by (1) and (2).

 
## # check files
python combine_SystHist.py --checkfiles -t YYY_MM_DD_TTTTTT
python combine_SystHist.py --checkfiles -t YYY_MM_DD_TTTTTT --launch

## # check files pdfs
python combine_SystHist.py --checkfiles --checkpdffiles -t YYY_MM_DD_TTTTTT
python combine_SystHist.py --checkfiles --checkpdffiles -t YYY_MM_DD_TTTTTT --launch

After all jobs is done. Run the following command to combine root files:

 
## combine qcdScales root and txt files and calculate max/min (up/down)
python combine_SystHist.py --QCDScale

# Combine PDF root and txt files and calculate max/min (up/down)
python combine_SystHist.py --PDF

# Combine all root files
python combine_SystHist.py --Standard

##  organize root files and txt files
python combine_SystHist.py --organizefiles
python combine_SystHist.py --organizefiles --launch

(4) Then use just the third part of the above code (HHResultProducer.py) to combine txt files. Turn off the first and the second part, and turn the third part on.

   if True :
      uujjcardfiles = MuonOptCutFile.replace('.txt','_systable*.txt')
      #uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')

      uujjcards = ParseFinalCards(uujjcardfiles)
      #uvjjcards = ParseFinalCards(uvjjcardfiles)#fixme using uujj for now for speed
      #finalcards = FixFinalCards([uujjcards,uvjjcards])
      finalcards = FixFinalCards([uujjcards])

      print 'Final Cards Available in',finalcards

Then run once for muon and again for electron channel:

python HHResultProducer.py

(5) You will see the final txt file "Results_Testing_diHiggs.../FinalCardsLQuu.txt" and the directory where the root files are stored, "rootfiles_final".

These outputs will be used in setting limits.

Limit calculation and Limit plots

Limits for bblljj channel

(1) The root files in rootfiles_final have distributions with very fine binning. We have to rebin the distribution before limit setting. The binning should be chosen to optimize the sensitivity while maintining the stability of the fit.

cd LimitSetting/RebinHH

# Rebin uu
python rebinHistHH.py --resType HHres --channel uu 
python refixHistHH.py --resType HHres --channel uu
python plotSystHistHH.py --resType HHres --channel uu

# Rebin ee 
python rebinHistHH.py --resType HHres --channel ee 
python refixHistHH.py --resType HHres --channel ee
python plotSystHistHH.py --resType HHres --channel ee

(2) Checkout the higgs limit tool code following instruction From: SWGuideHiggsAnalysisCombinedLimit, the same as what described in Part IV: Making limit plots

After running the following command, take note of the results output on the screen.

cd LimitSetting

# Run to get Limits
python RunStatsBasicCLsHH.py --do_BetaOne --HHresuujj -c LimitShapeHH --Asymptotic_Only --noAbsAcc --NpIgnore
python RunStatsBasicCLsHH.py --do_BetaOne --HHreseejj -c LimitShapeHH --Asymptotic_Only --noAbsAcc --NpIgnore

python RunStatsBasicCLsHH.py --doMakeCombineCards --HHresCombMuEle -c LimitShapeHH
python RunStatsBasicCLsHH.py --do_BetaOne --HHresCombMuEle -c LimitShapeHH --Asymptotic_Only --noAbsAcc --NpIgnore

(3) Use the values output on the screen to modify the code MakeLimitsPlots.py. Then run the script to get limit plots.

# plot observed limits to HH
cd PlotMacros
python MakeLimitsPlotsHH.py --channel muon  --resType HHres --obslimit 1
python MakeLimitsPlotsHH.py --channel ele  --resType HHres --obslimit 1
python MakeLimitsPlotsHH.py --channel CombMuEle  --resType HHres --obslimit 1

Scripts for running Combine Tool

I have written a script (runCheckLimits.py) which help guide through typical commands in Combine tool. The input of the script is a directory with the following structure

|---/CLSLimits/
      |--- [CombMuEledatacard].txt
      |---/LimitShapeHHuu/
             |--- [datacard].txt
             |--- [distribution].root
      |---/LimitShapeHHee/
             |--- [datacard].txt
             |--- [distribution].root

Example of cammands

## Remove NP 
python runCheckLimits.py -ch uu --removeNP -v 1 --launch

## txt to workspace
python runCheckLimits.py -ch uu --text2workspace -v 1 --launch

## Fit and record range
python runCheckLimits.py -ch uu --FitDiagnos -v 1 --launch 

## Limits
python runCheckLimits.py -ch uu --limits -v 1 --launch
## Run obs GOF and record the values</span>
python runCheckLimits.py -ch uu --runGOF -v 1 --data --toy --pval --launch

## Run to get impacts
python runCheckLimits.py -ch uu --runImpactIni -v 1 --launch
python runCheckLimits.py -ch uu --runImpactDoFit -v 1 --launch
python runCheckLimits.py -ch uu --runImpactPDF -v 1 --launch

To run spin2 case, add --Spin2 to the command, like:

python runCheckLimits.py -ch uu --text2workspace -v 1 --launch --Spin2

The options for -ch can be uu, ee, or CombMuEle

Trees

The relevant trees are on the cmsneu machine:

/media/dataPlus/dmorse/hhNtuples/NTupleAnalyzerHH_Full2016HH_v236_2017_04_03/SummaryFiles/
/media/dataPlus/dmorse/hhNtuples/NTupleAnalyzerHH_Full2016HH_v236_QCDNonIsoQuickTest_2017_05_15/SummaryFiles/

Gridpack generation

Follow directions at QuickGuideMadGraph5aMCatNLO

Adding diboson resonance model

After downloading the genproductions package, you will need to edit the genproductions/bin/MadGraph5_aMCatNLO/gridpack_generation.sh file to compile the diHiggs resonance model in the following way:

After

 # where to find the madgraph tarred distribution

Add

## Models for searches of diboson resonances
VVMODEL=dibosonResonanceModel.tar.gz
VVSOURCE=https://cms-project-generators.web.cern.ch/cms-project-generators/$VVMODEL

After

 
  #load extra models if needed
  if [ -e $CARDSDIR/${name}_extramodels.dat ]; then
    echo "Loading extra models specified in $CARDSDIR/${name}_extramodels.dat"
    #strip comments
    sed 's:#.*$::g' $CARDSDIR/${name}_extramodels.dat | while read model
    do
      #get needed BSM model
      if [[ $model = *[!\ ]* ]]; then
        echo "Loading extra model $model"
        wget --no-check-certificate https://cms-project-generators.web.cern.ch/cms-project-generators/$model   
        cd models
        if [[ $model == *".zip"* ]]; then
          unzip ../$model
        elif [[ $model == *".tgz"* ]]; then
          tar zxvf ../$model
        elif [[ $model == *".tar"* ]]; then
          tar xavf ../$model
        else 
          echo "A BSM model is specified but it is not in a standard archive (.zip or .tar)"
        fi
        cd ..
      fi
    done
  fi

Add

  #get Diboson model
  wget --no-check-certificate ${VVSOURCE}
  cd models
  tar xvzf ../${VVMODEL}
  cd ..

Preparing run cards

There are 3 models, which all need runcards and gridpacks:

  1. Spin-0 Radion
  2. Spin-2 Bulk Graviton
  3. Spin-2 Randall-Sundrum (RS) Graviton

For each type you need 3 data files. Here are an example of the naming of the cards, for Radion model with resonance mass of 300 GeV:

Radion_hh_hZZToLLhbb_EMuOnly_narrow_M300_customizecards.dat
Radion_hh_hZZToLLhbb_EMuOnly_narrow_M300_proc_card.dat
Radion_hh_hZZToLLhbb_EMuOnly_narrow_M300_run_card.dat

These files, as well as example files for the 2 graviton models, can be found on lxplus at ~dmorse/public/HH/exampleMGrunCards/

New data files need to be created for each mass X and model Y, changing each instance of mass 300 into mass X, and renaming the data files with model Y and mass X. The data files should always be kept, as they will need to be added to the HH github repository later. Be careful to keep track of all mass references and the model type.

With these three files the gridpack can be created following the instructions on the Twiki. I ran interactively, but it may be worthwhile to use the grid since there will be 16*3=48 gridpacks to produce.

This should be repeated for the following masses:

250 260 270 300 350 400 450 500 550 600 650 700 750 800 900 950

8 TeV Second Gen LQ Pair Production Analysis Documentation

History

To-Do

Part I: Production of NTuples

Overview

Status

  • All LQ Scalar signal, background, and single-mu data is processed.
  • Vector LQ samples and RPV susy samples will need processing.
  • Old MC was 53X_V7, and new is 53X_V19 - possible updates to ntuplemaker needed. New MC created in 5_3_16.

Where to find NTuples.

Part II: Analysis of NTuples

All of this code is stored in a single github repository, which can be checked out as follows:

git clone git@github.com:darinbaumgartel/LQ2Analysis8TeV.git LQ2Analysis8TeV
cd LQ2Analysis8TeV
git checkout paperversion
cmsrel CMSSW_6_2_0
cd CMSSW_6_2_0/src
cmsenv
cd -

Some of these instructions are re-iterated from the github readme file

The goal of Part II is to creation of small flat analysis trees, since the Ntuples on EOS are usually on the order of 10TB, and need to be reduced for running interactive analysis. Jobs are run on the lxplus batch system to analyze the NTuples. The ntuples contain variables such as:

  • Vectors of values for kinematics. i.e. "PFJetPt" is a vector of doubles, with one pT value for each jet stored in the event.
  • Vectors of values for quality criteria (id, isolation, etc)
  • Event-level information, including MET quality flags, vectors of different PDF weights (for MC), vertex information, etc.

The analysis trees aim at being roughly a factor of 1000 smaller than the ntuples (order of 10GB), and to contain only single valued variables (ints and floats)... ostensibly basically a large flat ntuple. (The Tntuple class itself does not allow enough variables). Values in the trees will be things like:

  • The pT of the leading muon, and of the subleading muon, and the same for jets. And the eta and phi values.
  • Kinematic variables used for optimization, like the di-muon mass, the MT(muon,MET), the ST = (pt_muon1+pt_muon2+pt_jet1+pt_jet2), etc...
  • Single event weights which include the pileup reweighting.
  • Trigger flags, and event quality flags.
  • Basically, anything needed to generate a histogram with the appropriate weights, and interactively changeable cuts.

The following are steps for creating the trees

Getting integrated luminosity and pileup weight

A shell script is available at this link, and has been made for the newest (Jan2012ReReco) datasets. It will retun root files with the PU histograms, which will be used interactively by the tree making code to get the weights for each event.

It can be run like:

./GetPULumiInfo.tcsh

This is already done in practice. Histograms are available for the central value and for systematic variations up and down

The integrated luminosity is just the official value (19.712 +/- 0.513)/fb - from this HN thread

Get organized with a spreadsheet

A simple csv file is used to keep track of everything involved in the tree production. This includes the locations of the ntuple files on EOS, any JSON file you wish to use on real data, cross-sections, original numbers of events (for normalization) and group names (output files will be merged according to their group).

An example of a spreadsheet (in progress) is at this link .

The spreadsheet must contain the following columns:

SignalType A unique identifier at the beginning of the names of the root files.
Xsections The cross-section to normalize the sample to (NLO is better!)
N_orig The original number of events in the sample (If you don't know this, see next steps for instructions)
Group The group for the files. For instance, you might have three SignalTypes like WW, WZ, and ZZ, and want to put them in a Group called "DiBoson"
CustomJson The name of a Json file specifying the good runs/lumis to use. This can be the same for every data type, or different, or 0 for MC
EOSDirectory The eos path where the files are kept for this signaltype. Should be like a typical EOS path e.g. /store/group/..../
This spreadsheet will be a command-line argument of the analysis batcher (discussed more in steps below).

NEW May 14, 2014 A summer 2014 CSV file has been created with V00-03-11 ntuples for the reload of the public PAS version of the LQ2 analysis. A couple data jobs (out of thousands) are missing, but it is functional for preparing the analysis.

What if you don't know the original number of events in the ntuples?

Since the NTupleMaker applies a skim, histograms are stored in the NTuple root files which contain the original number of events. This is needed for correct MC event weights.

The analysis batcher is built to read the csv file, and use a run a python script on every root file contained in the EOS directories specified in the CSV. Normally this is for making tree files, but it can also just make a txt file containing the number of events from the event count histogram in the NTuples, and we can merge that info to get the totals - and then fix up the CSV.

In practice it is much easier than it sounds smile Just run the following:

 
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleEvCounter.py -t EventCount_Summer2014_V01 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 8nh -s 250 --FileRefresh --merge

Just to clarify on the arguments:

-i CSV File The CSV file you wish to run on.
-t Tag name Results will output in a directory specified by this tag-name
-j JSON file Not important here
-p 0 Not important here.
-q 8nh The batch queue to use.
-s 100 Number of files to analyzer per batch job (the split number)
--FileRefresh The code will automatically make a list of good ntuple files and store it locally for future use, so it doesn't have to re-read directories all the time. This demands to re-read directories.
--merge Compile the results. For event counting, this lists total aggregated counts. For tree-making, it hadds the output into single root files (SummaryFiles)
This will produce an event count log (see screen output) and you can update the csv file accordingly.

NEW May 14, 2014

The summer 2014 CSV file file has been used to update the event counts. This is the syntax:

 
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleEvCounter.py -t EventCount_Summer2014_V01 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 8nh -s 250 --FileRefresh

Furthermore, if you ran the above commmand, and you don't want to wait for it to finish, leave and pickup where you left off. Here, the previous command made the directory "NTupleEvCounter_EventCount_Summer2014_V01_2014_05_12_23_20_07", and this is the command to continue where we left off:

python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleEvCounter.py -t EventCount_Summer2014_V01 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 8nh -s 250 -d NTupleEvCounter_EventCount_Summer2014_V01_2014_05_12_23_20_07

Finally, add the "--merge" flag to take all the output files (in this case one txt file per root file) and sum the output

python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleEvCounter.py -t EventCount_Summer2014_V01 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 8nh -s 250 -d NTupleEvCounter_EventCount_Summer2014_V01_2014_05_12_23_20_07 --merge

A message on the screen will tell you the output file location. e.g.:

Output is in  /afs/cern.ch/user/d/darinb/LQ2014/NTupleAnalyzerV3B/NTupleInfo2012Full_Summer2014_EventCountLog.csv 

The summer 2014 CSV file has been updated with these event counts.

How to create the analysis trees.

Similar to what we did above, a different python file will be used to analyze the ntuples (using pyROOT), and make lightweight trees. There will be one file for every NTuple file, but the tree will use "hadd" to merge these files according to the "Group" in the CSV file. At the end of it all, we will have just a few root files like "SingleMuData.root" "ZJets.root" etc - which contain everything we need for each data or MC sample.

These files can take up several GB. I recommend using an AFS work space like "/afs/cern.ch/work/u/username/..." and bumping up your quota to 100GB at the afs workspaces page

Running is similar to the previous step. This is the command:

python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleAnalyzer.py -t FullJun27 -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 1 -q 2nd -s 100  --merge

To detail the arguments:

-i CSV File The CSV file you wish to run on.
-t Tag name Results will output in a directory specified by this tag-name
-j JSON file Default JSON file. Over-ridden by column in CSV file. Always put something here.
-p 0 Whether or not to store branches of pdf weights.
-q 1nd The batch queue to use.
-s 15 Number of files to analyze per batch job (the split number)
-y 2016 Year of the data and MC. Choices are 2016, 2017, 2018
--FileRefresh The code will automatically make a list of good ntuple files and store it locally for future use, so it doesn't have to re-read directories all the time. This demands to re-read directories.
--merge Compile the results. For event counting, this lists total aggregated counts. For tree-making, it hadds the output into single root files (SummaryFiles)
Other studies might require slight modifications to what you want to run. A couple are built in via the tagname option. These include:

QCDNonIso Removed muon isolation so that QCD studies can be performed
EMuSwitch Replaces one muon with a HEEP electron, for ttbar studies in an enriched e-mu control region
Examples of these runs are as follows. Typically you might not need the full CSV files for these, so you could make new ones with just the datasets you need

python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleAnalyzer.py -t FullJun27EMuSwitch -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 1 -q 2nd -s 100 -y 2016 --merge

python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full_Summer2014.csv -py NTupleAnalyzer.py -t FullJun27QCDNonIsoQuickTest -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 0 -q 1nd -s 100 -y 2016 --merge

What if it takes too long?

It is also important to note that these jobs can take several hours or a day, depending on what you run. The AnalyzerMakerFastLocal.py script stays on and waits for the batch jobs to finish. If a file is missing, it relaunches it. This is all transparent to the user, but it can stay on the screen for quite a while.

There are two tools to help out with this.

  • The first (and best) way is to to leave and come back later. When you see that the AnalyzerMakerFastLocal script has submitted its jobs and is waiting for them to be done, you can just ctrl+c and come back later. The TagName option has made a working directory with a time-and-date stamp, and you can specify to pick up where you left off. So go home, come back, and re-issue the the command specifying the working directory with the "-d" command, like:
python AnalyzerMakerFastLocal.py -i NTupleInfo2012Full.csv -py NTupleAnalyzer.py -t MyFirstFullRun -j Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -p 1 -q 1nd -s 15 -y 2016 -d NTupleAnalyzer_MyFirstFullRun_2013_10_22_23_36_59

  • Another helpful tool is "screen". If you don't want to wait around for jobs to finish, use a "screen" on lxplus. Note the lxplus machine you are on (e.g. lxplus430), and open a screen by typing "screen". Launch the jobs with AnalyzerMakerFastLocal.py as described above. Type ctrl+A+D to exit. You can then log out, go home, etc, and log back into to lxplus430, and type "screen -r" to return to your screen.

Part III: Running the result producer

From the flat trees, analysis and results are performed. This includes many steps, almost all of which are done in a single result producer python script . In this script, you will find the following:

  • The definition of the event weight, according to the weight variable in the trees, HLT reweighting MC with a single-mu trigger, and the integrated luminosity.
  • Definition of a preselection - a selection which demands the LQ decay product particles, but doesn't make strong cuts. Normalization of the Z, W, and TTbar backgrounds will be done at preselection.
  • Functions that make histograms at preselection and final selection.
  • Functions that can optimize signal-background separation with cuts based on S/sqrt(S+B) and store the result to a txt file.
  • Functions that can make the event counts tables based on the above optimizations.
  • Functions that can make cards compatible with the Higgs limit tools, to get the real "final" results.
  • Unused functions which can perform analysis with TMVA (interesting, but not needed for our normal results). [deprecated]

Basically, everything you need to run is in the main() function of the script. This is just a few background studies, plot production, and table production (tables get parse to higgs limit tool cards). This is a brief overview of the commonly used parts of the code.

Setup and directory

At the beginning of the main function, you will see the following lines, followed by the definition of the variable binnings for different histograms. The results in this example would be output to a directory called "Results_Testing_Aug8". It will attempt to make this directory. If it already exists, that is fine. The *OptCutFile files must be present in the directory - they detail the final selection cuts.

def main():

   #######################################################################################
        ######  The output directories, and the files that define final selection cuts  #######
   #######################################################################################

   # Please retain the "scriptflag" comment. Some python scripts are available which search
   # for this, and make use of it. e.g. For systematic variations, we can in batch instead
   # of running serially, which speeds things up.

   version_name = 'Testing_Aug8' # scriptflag
   os.system('mkdir Results_'+version_name) 

   MuMuOptCutFile = 'Results_'+version_name+'/Opt_uujjCuts_Smoothed_pol2cutoff.txt' # scriptflag
   MuNuOptCutFile = 'Results_'+version_name+'/Opt_uvjjCuts_Smoothed_pol2cutoff.txt' # scriptflag
 

PDF Uncertainties

The PDF uncertainties are time consuming. By enabling the following lines (turning False to True), you will run the PDF studies. This will print out some lists of uncertainties on the screen. I recommend not to do this normally. You can paste the output of this routine into the "HARD CODED RESULTS USED IN ANALYSIS" section.


   # ====================================================================================================================================================== #
   # These are PDF uncertainty studies. Ignore these for now!      ### STILL FIXING THIS PORTION
   # ====================================================================================================================================================== #

   if (False):
      PDF4LHCUncStudy(MuMuOptCutFile,MuNuOptCutFile,version_name)
      PDF4LHCPlotsFromResultDict('Results_'+version_name+'/PDFVariationsDictionary.json',version_name)

QCD Studies

QCD is negligible, and we just conduct a few studies to validate that. These studies make plots and produce a few numbers to cite in the analysis note. You can enable the QCD studies by turning on this part of the code:

   # ====================================================================================================================================================== #
   # This is the QCD study. It will make a couple plots, and test QCD contamination at final selection. We consider QCD negligible, but is good to test this once!
   # ====================================================================================================================================================== #
   if (False):
      qcdselectionmumu = '((Pt_muon1>45)*(Pt_muon2>45)*(Pt_jet1>125)*(Pt_jet2>45)*(St_uujj>300)*(DR_muon1muon2>0.3))'
      qcdselectionmunu = '((Pt_muon1>45)*(Pt_muon2<45.0)*(Pt_jet1>125)*(Pt_jet2>45)*(Pt_ele1<45.0)*(St_uvjj>300)*(DPhi_muon1met>0.8)*(DPhi_jet1met>0.5))'

      QCDStudy(qcdselectionmumu,qcdselectionmunu,MuMuOptCutFile,MuNuOptCutFile,NormalWeightMuMu,NormalWeightMuNu,version_name)

TTbar E-Mu data-driven studies

The data-driven TTbar estimate relies on a calculation of the mu-mu to e-mu efficiency, and we also provide some characteristic plots and numbers in the analysis note. To create this material, you can turn on the following part of the main function. Once again, there is no need to run this all the time. When the study is stable, you just need to store the reweighting factor in the "HARD CODED RESULTS" section of the script - e.g. " emu_id_eff = 0.5716 "


   # ====================================================================================================================================================== #
   # The ttbar e-mu data-driven study. You only need to do this once, check the validation output, and use the scale-factor as global variable "emu_id_eff"
   # The scale-factor should be near 0.5
   # ====================================================================================================================================================== #
   if (False):
      # TTBar STudy
      [Rtt_uujj,Rtt_uujj_err] = GetEMuScaleFactors( NormalWeightEMu+'*'+preselectionemu, EMuDirectory)
      Rw_uvjj,Rz_uujj = [1.0,1.0]

      # # PreSelection Plots
      MakeBasicPlotEMu("St_uujj","S_{T}^{e #mu j j} [GeV]",stbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("Pt_miss","E_{T}^{miss} [GeV]",metbinning2,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("M_uu","M^{e #mu} [GeV]",bosonbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("M_uujj2","M^{e/#mu j}_{2} [GeV]",lqbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("DR_muon1muon2","#DeltaR(#mu,e})",drbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuseltagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)

      MakeBasicPlotEMu("St_uujj","S_{T}^{e #mu j j} [GeV]",stbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("Pt_miss","E_{T}^{miss} [GeV]",metbinning2,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("M_uu","M^{e #mu} [GeV]",bosonbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("M_uujj2","M^{e/#mu j}_{2} [GeV]",lqbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      MakeBasicPlotEMu("DR_muon1muon2","#DeltaR(#mu,e})",drbinning,preselectionemu,NormalWeightEMu,EMuDirectory,'emuselPAStagfree','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)

Making analysis-note style plots

The analysis-note style plots have a ratio subplot, and no CMS stamp. Each plot is a single function-call of MakeBasicPlot(args). There is also an option to make a basic results table. You can turn all this on in the following part of the code:


   # ====================================================================================================================================================== #
   # This is a basic plotting routine to make Analysis-Note style plots with ratio plots. 
   # ====================================================================================================================================================== #
   if (False):

      # Get Scale Factors
      [[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
      [[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')

      # Optionally, you can make an event-count table for each selection. Useful if testing a new optimiation
      # We will do this later wtih full systematics for our set of stable cuts. 
      if False:
         QuickTableTTDD(MuMuOptCutFile, preselectionmumu+"*(M_uu>100)",NormalWeightMuMu,Rz_uujj, Rw_uvjj,Rtt_uujj,0)
         QuickTable(MuNuOptCutFile, preselectionmunu,NormalWeightMuNu,Rz_uujj, Rw_uvjj,Rtt_uvjj,0)

      # Here are a few plots which are zoomed-in on control regions. 
      MakeBasicPlot("M_uu","M^{#mu #mu} [GeV]",bosonzoombinning_uujj_Z,preselectionmumu,NormalWeightMuMu,NormalDirectory,'controlzoom_ZRegion','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      ...etc...

      # UUJJ plots at preselection, Note that putting 'TTBarDataDriven' in the name turns on the use of data-driven ttbar e-mu sample in place of MC
      MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standard_TTBarDataDriven','uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
      ... etc ...

Running the PAS-style plots

PAS-style plots have no ratio subplots, and have a CMS stamp. You can enable this portion of the main function to run the PAS-style plots. There is also a loop over LQ mass values that you can easily modify for different final-selection plots. Using the "tagfree" file tag will remove the CMS stamp.

   # ====================================================================================================================================================== #
   # This is a plotting routine for PAS-style publication-quality plots
   # ====================================================================================================================================================== #

   if False:

      # Get Scale Factors
      [[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
      [[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')

      ...etc...

      # The two flags are for regular plots, and tagfree plots (plots that don't say CMS Preliminary - for notes or thesis)
      for flag in ['','tagfree']:
         # Preselection plots in the UUJJ channel in the PAS style (no subplot)
         MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
         ... etc ...

Running the full event yields (with all systematic variations) and making higgs limit-tool cards

There are two ways to do this. It is easy to just let this run, but in practice it has been time consuming (many systematic variations) and can take more than a day. The end-goal of this all is that it produces several txt files in the result directories containing the event yields, and then combines all that information into higgs limit tool cards.

The long way to do this is to just let this part of the code run:

   # ====================================================================================================================================================== #
   # This runs a "FullAnalysis" - i.e. produces tables with full systematis included. 
   # ====================================================================================================================================================== #

   # You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
   # Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
   # When done, proceed to the next step to make higgs limit cards
   if (True):
      FullAnalysis(MuMuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuMu,'TTBarDataDriven')  # scriptflag
      FullAnalysis(MuNuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuNu,'normal')  # scriptflag

   if (True):
      uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
      uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')

      uujjcards = ParseFinalCards(uujjcardfiles)
      uvjjcards = ParseFinalCards(uvjjcardfiles)
      finalcards = FixFinalCards([uujjcards,uvjjcards])

      print 'Final Cards Available in ',finalcards

The short way circumvents the running of the FullAnalysis(...) functions, by making multiple copies of the LQResultProducer - one for each systematic variation - and running them in batch in parallel. This is much faster, but you have to rely on another script to parse it all out. When the script finishes, you just have to run the second part of the above (with the ParseFinalCards and FixFinalCards commands). To use this method, do the following, and wait for the batch-jobs to finish.

(1) Run

 
python SysBatcher.py --launch

(2) Wait for those batch jobs to finish, and then use just the second part of the above code

   if (True):
      uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
      uvjjcardfiles = MuNuOptCutFile.replace('.txt','_systable*.txt')

      uujjcards = ParseFinalCards(uujjcardfiles)
      uvjjcards = ParseFinalCards(uvjjcardfiles)
      finalcards = FixFinalCards([uujjcards,uvjjcards])

      print 'Final Cards Available in ',finalcards

Part IV: Making limit plots

The limit-setting code in now in a subdirectory of the github paperversion branch. It has some standalone instructions, but since the LQResultProducer already produces a compatible FinalCards.txt, here are the short instructions.

(1) Checkout the higgs limit tool code and appropriate CMSSW version

setenv SCRAM_ARCH slc5_amd64_gcc472
cmsrel CMSSW_6_1_1
cd CMSSW_6_1_1/src
cmsenv
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit
git pull origin master
git checkout V03-05-00
cd ../../
scramv1 b clean
scramv1 b -j 16
cmsenv
cd ../../
combine --help

(2) Put your FinalCards.txt file in the same directory as the RunBasicStatsCLs.py script.

(3) Run the single-channel limits (limit results will print to screen - save them!).

 python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only

(4) Run the combined limit. It will have to batch out some jobs, so it is a two step process:

  • Run the script which creates and launches the jobs (one job per beta value)
    •  python AsympBatcher.py --do_Combo -q 8nh -n 1 -c LimitTest --Asymptotic_Only 
    • Job output will be in the "BatcherResults" folder.
  • When jobs are complete, merge them. Results will print to the screen.
    •  python AsympBatcherParser.py  

5) Take the results from (3) and (4) and put them into the plot macros. The plot macros compile and run. An example of this wold be:

{

gROOT->ProcessLine(".L ComboPlotLQ2.C++");
gROOT->ProcessLine("makePlots()");

gROOT->Reset();

gROOT->ProcessLine(".L BR_Sigma_MuMu_vsMass.C++");
gROOT->ProcessLine("makePlotsBO()");

gROOT->Reset();

gROOT->ProcessLine(".L BR_Sigma_MuNu_vsMass.C++");
gROOT->ProcessLine("makePlotsBH()");

gROOT->ProcessLine(".q");

}

Part V: Making publication-quality tables

The event yield and systematics tables can be automatically created based on the FinalCards.txt file (the higgs limit-card file) which has already been created. It simply needs to be parsed into a LaTeX table. There are two scripts available.

To get the content of the event yields table, do :

python CleanTableParse.py your/path/to/FinalCards.txt

To get tables of systematic uncertainties, do:

python SysTableParse.py your/path/to/FinalCards.txt

This last command also prints out the total percent uncertainties for each channel/mass. This list is used in the LQResultProducer.py in the "HARD CODED RESULTS" section to define the uncertainty band, used in the PAS-style plots.

Part VI: Making PAS-quality plots.

The output of the table production (Part 4) will give you lists of systematic uncertainties percent values at each LQ mass. Use this in the "HARD CODED RESULTS" part of the LQ result producer to update the lines:

# These are the total background uncertainties. They are used just to make some error bands on plots. 
totunc_uujj = [3.46, 2.55, 2.98, 3.67, 5.44, 5.73, 6.21, 6.47, 9.47, 11.16, 14.79, 19.75, 31.11, 35.41, 8.12, 8.12, 8.12, 8.12, 8.12]
totunc_uvjj = [13.37, 12.37, 13.35, 12.97, 13.62, 15.94, 16.23, 20.67, 23.94, 30.29, 35.3, 52.94, 69.57, 46.09, 42.3, 42.3, 42.3, 42.3, 42.3]

Then you can run the standard PAS plots in the LQResultProducer, i.e. enable this part of the main function:


   # ====================================================================================================================================================== #
   # This is a plotting routine for PAS-style publication-quality plots
   # ====================================================================================================================================================== #

   if False:

      # Get Scale Factors
      [[Rz_uujj,Rz_uujj_err],[Rtt_uujj,Rtt_uujj_err]] = GetMuMuScaleFactors( NormalWeightMuMu+'*'+preselectionmumu, NormalDirectory, '(M_uu>80)*(M_uu<100)', '(M_uu>100)')
      [[Rw_uvjj,Rw_uvjj_err],[Rtt_uvjj,Rtt_uvjj_err]] = GetMuNuScaleFactors( NormalWeightMuNu+'*'+preselectionmunu, NormalDirectory, '(MT_uv>70)*(MT_uv<110)*(JetCount<3.5)', '(MT_uv>70)*(MT_uv<110)*(JetCount>3.5)')

      ...

      # The two flags are for regular plots, and tagfree plots (plots that don't say CMS Preliminary - for notes or thesis)
      for flag in ['','tagfree']:

         # Preselection plots in the UUJJ channel in the PAS style (no subplot)
         MakeBasicPlot("Pt_jet1","p_{T}(jet_{1}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
         MakeBasicPlot("Pt_jet2","p_{T}(jet_{2}) [GeV]",ptbinning,preselectionmumu,NormalWeightMuMu,NormalDirectory,'standardPAS_TTBarDataDriven'+flag,'uujj',Rz_uujj, Rw_uvjj,Rtt_uujj,'',version_name,500)
               ...

Making results for Vector LQ

The goal here will be to port all this machinery to making Vector LQ results with as little effort as possible. To do so, I've made some tweaks to the LQResultProducer, such that many of the output file names will contain the letters "LQ". From now on, "LQ" will indicate the normal (scalar) analysis, and the different vector scenarios will be denoted by "AM", "MC", "MM", and "YM". Correspondingly, I've organized the tree-making facility to name all the new signal files like this. e.g., see the vector LQ csv file. After making trees with the vector LQ csv file, and putting those new files into the same directory as the scalar files, we can proceed with some modifications to the code to run on vector LQ.

Fixing up the cut cards

In the results directory, there are typically a couple txt files which store the optimized cuts. They are called something like "Opt_uvjjCuts_Smoothed_pol2cutoff.txt", and store text that looks like this:

opt_LQuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_LQuujj350 = ((M_uu>115)*(St_uujj>460)*(M_uujj2>115))
opt_LQuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
.....

In contrast our vector LQ has mass points every 100 GeV, and goes up to 1800. First, let's fix the file names of the cut cards for scalar:

mv Opt_uujjCuts_Smoothed_pol2cutoff.txt OptLQ_uujjCuts_Smoothed_pol2cutoff.txt
mv Opt_uvjjCuts_Smoothed_pol2cutoff.txt OptLQ_uvjjCuts_Smoothed_pol2cutoff.txt

Next, copy the content of these files to new vector YM files, changing the "LQ" to "YM", using only the values every 100 GeV, and extending to 1800 (holding the last cuts fixed. For example:

Contents of OptYM_uujjCuts_Smoothed_pol2cutoff.txt:

opt_YMuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_YMuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
opt_YMuujj500 = ((M_uu>150)*(St_uujj>685)*(M_uujj2>155))
opt_YMuujj600 = ((M_uu>175)*(St_uujj>820)*(M_uujj2>210))
opt_YMuujj700 = ((M_uu>195)*(St_uujj>935)*(M_uujj2>295))
opt_YMuujj800 = ((M_uu>215)*(St_uujj>1040)*(M_uujj2>400))
opt_YMuujj900 = ((M_uu>230)*(St_uujj>1135)*(M_uujj2>535))
opt_YMuujj1000 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1100 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1200 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1300 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1400 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1500 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1600 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1700 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))
opt_YMuujj1800 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))

Contents of OptYM_uvjjCuts_Smoothed_pol2cutoff.txt:

opt_YMuvjj300 = ((MT_uv>155)*(St_uvjj>455)*(M_uvjj>125))
opt_YMuvjj400 = ((MT_uv>205)*(St_uvjj>625)*(M_uvjj>175))
opt_YMuvjj500 = ((MT_uv>245)*(St_uvjj>800)*(M_uvjj>225))
opt_YMuvjj600 = ((MT_uv>275)*(St_uvjj>980)*(M_uvjj>280))
opt_YMuvjj700 = ((MT_uv>300)*(St_uvjj>1160)*(M_uvjj>330))
opt_YMuvjj800 = ((MT_uv>315)*(St_uvjj>1345)*(M_uvjj>380))
opt_YMuvjj900 = ((MT_uv>320)*(St_uvjj>1530)*(M_uvjj>435))
opt_YMuvjj1000 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1100 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1200 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1300 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1400 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1500 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1600 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1700 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))
opt_YMuvjj1800 = ((MT_uv>320)*(St_uvjj>1720)*(M_uvjj>490))

These cuts are all the same for all vector LQ samples, so just copy it to appropriately named/structured files for "AM", "MC", and "MM".

sed 's/YM/MC/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptMC_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MC/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptMC_uvjjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/AM/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptAM_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/AM/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptAM_uvjjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MM/g' OptYM_uujjCuts_Smoothed_pol2cutoff.txt > OptMM_uujjCuts_Smoothed_pol2cutoff.txt
sed 's/YM/MM/g' OptYM_uvjjCuts_Smoothed_pol2cutoff.txt > OptMM_uvjjCuts_Smoothed_pol2cutoff.txt

Running on the vector samples

The LQResultProducer has been written such that we can just replace the "LQ" with any of the vector names and run. Since we are basically just interested in getting Higgs Limit cards for each of the vector samples, we can do the same with the SysBatcher.

First, clone the LQResultProducer and the SysBatcher for vector purposes:

sed 's/LQ/YM/g' LQResultProducer.py > YMResultProducer.py 
sed 's/LQ/YM/g' SysBatcher.py > YMSysBatcher.py 
sed 's/LQ/MM/g' LQResultProducer.py > MMResultProducer.py 
sed 's/LQ/MM/g' SysBatcher.py > MMSysBatcher.py 
sed 's/LQ/MC/g' LQResultProducer.py > MCResultProducer.py 
sed 's/LQ/MC/g' SysBatcher.py > MCSysBatcher.py 
sed 's/LQ/AM/g' LQResultProducer.py > AMResultProducer.py 
sed 's/LQ/AM/g' SysBatcher.py > AMSysBatcher.py 

Next, just launch the batchers and wait. I recommend not launching them all at once - wait an hour or two between each. Each of these is opening the root files in an afs workspace - so it is best to stagger it a bit.

python YMSysBatcher.py --launch
python MMSysBatcher.py --launch
python MCSysBatcher.py --launch
python AMSysBatcher.py --launch

Then, you can produce the higgs limit cards (the "FinalCards" files in the same as the scalar samples: https://twiki.cern.ch/twiki/bin/view/Main/LQPairSecondGen13TeVOutline#Running_the_full_event_yields_wi

You will have to do that once, for each of YMResultProducer.py, MCResultProducer.py, MMResultProducer.py, and AMResultProducer.py. It will yield four card files: FinalCardsYM.txt, FinalCardsMC.txt, FinalCardsMM.txt and FinalCardsAM.txt.

Getting vector LQ limits

First, we have to fix one small problem. Because the Higgs limit tools presents the limit as a ratio of the assumed cross section (r), and reports it to 5 digits, the very large size of the LQ signal means we have to increase the r precision. I find that the building works best on lxplus5, but running the combine tool works fine on lxplus6 thereafter.

cd CMSSW_6_1_1/src
sed -i 's#6.4#9.9#g' ./HiggsAnalysis/CombinedLimit/src/Asymptotic.cc
scram b -j 16

New flags have been built into the limit tool to deal with the vector scenario. These flags switch on the use of the vector LQ card file (e.g. FinalCardsYM.txt), and they can be used as follows:

python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorYM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorAM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorMM
python RunStatsBasicCLs.py --do_BetaOne --do_BetaHalf -c LimitTest --Asymptotic_Only --vectorMC

Making results for RPV Susy In Progress

In this section we will follow the same methods as the vector LQ samples to get limits on the RPV susy (232) samples. These samples have mass values ranging from 200 to 1000 in 50GeV increments. The Scalar LQ samples were denoted "LQ" and the Vector samples as "YM", "MC", "MM", and "AM". For Susy we will follow a similar convention and label everything "RV"

Creating the cut card

In this example, I will use the same cuts as the regular LQ signal. Dave is further testing the optimal cuts and can modify this

Cuts should use m_stop-100 GeV LQ optimization point.

Contents of OptRV_uujjCuts_Smoothed_pol2cutoff.txt:

opt_RVuujj300 = ((M_uu>100)*(St_uujj>380)*(M_uujj2>115))
opt_RVuujj350 = ((M_uu>115)*(St_uujj>460)*(M_uujj2>115))
opt_RVuujj400 = ((M_uu>125)*(St_uujj>540)*(M_uujj2>120))
opt_RVuujj450 = ((M_uu>140)*(St_uujj>615)*(M_uujj2>135))
opt_RVuujj500 = ((M_uu>150)*(St_uujj>685)*(M_uujj2>155))
opt_RVuujj550 = ((M_uu>165)*(St_uujj>755)*(M_uujj2>180))
opt_RVuujj600 = ((M_uu>175)*(St_uujj>820)*(M_uujj2>210))
opt_RVuujj650 = ((M_uu>185)*(St_uujj>880)*(M_uujj2>250))
opt_RVuujj700 = ((M_uu>195)*(St_uujj>935)*(M_uujj2>295))
opt_RVuujj750 = ((M_uu>205)*(St_uujj>990)*(M_uujj2>345))
opt_RVuujj800 = ((M_uu>215)*(St_uujj>1040)*(M_uujj2>400))
opt_RVuujj850 = ((M_uu>220)*(St_uujj>1090)*(M_uujj2>465))
opt_RVuujj900 = ((M_uu>230)*(St_uujj>1135)*(M_uujj2>535))
opt_RVuujj950 = ((M_uu>235)*(St_uujj>1175)*(M_uujj2>610))
opt_RVuujj1000 = ((M_uu>245)*(St_uujj>1210)*(M_uujj2>690))

Run the event yields and produce higgs limit cards

Just as with the vector LQ, we can clone the LQResultProducer and the SysBatcher for the Susy scenario. Since the RPV susy is only in the mu-mu-j-j channel, we make one slight tweak to remove the uvjj running

sed 's/LQ/RV/g' LQResultProducer.py > RVResultProducer.py 
cat SysBatcher.py | sed 's/LQ/RV/g' | sed "s/,'uvjj'//g"  > RVSysBatcher.py 

Next, just launch the batcher and wait for the jobs to finish.

python RVSysBatcher.py --launch

Then, you can produce the higgs limit cards (the "FinalCards" files in the same as the scalar samples: https://twiki.cern.ch/twiki/bin/view/Main/LQPairSecondGen13TeVOutline#Running_the_full_event_yields_wi ) To do so, modify the "FullAnalysis" portion of the RVResultProducer.py as such:

   # ====================================================================================================================================================== #
   # This runs a "FullAnalysis" - i.e. produces tables with full systematis included. 
   # ====================================================================================================================================================== #

   # You can run this to make the full set of tables needed to construct the higgs card. This takes a long time!
   # Alternatively, you can run > python SysBatcher.py --launch to do each table in a separate batch job
   # When done, proceed to the next step to make higgs limit cards
   if (True):
      FullAnalysis(MuMuOptCutFile, preselectionmumu,preselectionmunu,NormalDirectory,NormalWeightMuMu,'TTBarDataDriven')  # scriptflag
   if (True):
      uujjcardfiles = MuMuOptCutFile.replace('.txt','_systable*.txt')
      uujjcards = ParseFinalCards(uujjcardfiles)
      finalcards = FixFinalCards([uujjcards])
      print 'Final Cards Available in ',finalcards

And then just run it, e.g.

python RVResultProducer.py

You should now have a "FinalCardsRV.txt" file in your results directory.

Getting RPV Susy limits

Note: this section requires the version of RunStatsBasicCLs.py committed to the paperversion on github on Aug 30. Please update repository if needed.

Proceed exactly as in scalar and vector LQ case, using the flag "--susyRV" during limit setting. To be precise, if you've never set up the Higgs limit tool in the LimitSetting directory before, just put the FinalCardsRV.txt in the LimitSetting directory, and do this:

cd LimitSetting
setenv SCRAM_ARCH slc5_amd64_gcc472
cmsrel CMSSW_6_1_1
cd CMSSW_6_1_1/src
cmsenv
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit
git pull origin master
git checkout V03-05-00
cd ../../
scramv1 b clean
scramv1 b -j 16
cmsenv

sed -i 's#6.4#9.9#g' ./HiggsAnalysis/CombinedLimit/src/Asymptotic.cc
scram b -j 16
cd ../../

python RunStatsBasicCLs.py --do_BetaOne -c LimitTest --Asymptotic_Only --susyRV

Getting RPV Limit plots

A plot macro has been drafted at LimitSetting/PlotMacros/BR_Sigma_MuMu_vsMassSUSYRPV.C

To run it, simply do:

root -b RunSusyRPVPlotsInRoot

An example output is at this link

Documentation for LQ post-processing

From: https://docs.google.com/document/d/1XZ35T-IwXzUpUCou3aqcIW35Cxx8j6_2lBjWqxjcmFQ/edit#heading=h.qpfb6p4b5p2s:

Modified code is kept on this GitHub branch: https://github.com/gmadigan/resonance_HH/tree/updates_for_LQ

Which has been forked from https://github.com/menglu21/resonance_HH

To run, follow instructions in README.md or contact Meng Lu (meng.lu@cernNOSPAMPLEASE.ch)

References for high-pT muons:

https://twiki.cern.ch/twiki/bin/view/CMS/MuonUL2016#High_pT_above_120_GeV

https://twiki.cern.ch/twiki/bin/view/CMS/MuonUL2017#High_pT_above_120_GeV

https://twiki.cern.ch/twiki/bin/view/CMS/MuonUL2018#High_pT_above_120_GeV

Changes (relative to HH analysis) have been made to the following files: modified: crab/crab_script.py modified: crab/samples2016.json modified: crab/samples2016apv.json modified: crab/samples2018.json new file: modules/LQProducer.py new file: modules/highPtMuonScaleResProducer.py new file: modules/muonRECOSFProducer.py new file: modules/topPtReweightProducer.py crab/crab_script.py This is where all the modules to run are specified. I have modified this to run the new modules I added (e.g., LQProducer) and removed modules we do not need. More modules will likely need to be added in the future for additional MC corrections. Current modules in use for MC are: countHistogramsModule (saves the generated number of events for normalizing the samples) PU reweighting L1 Prefire corrections (for HCAL and Muon system) Muon ID and isolation scale factors Muon reconstruction scale factors Muon scale and resolution corrections Electron reconstruction scale factors Electron ID scale factors Jet energy corrections b-jet efficiency scale factors Top pT reweighting LQ-specific selections and kinematic variables For data: Muon scale and resolution corrections Jet energy corrections LQ-specific selections and kinematic variables crab/samples2016.json Modified to only include background samples needed for LQ analysis, and single muon samples. Still need to add electron datasets. crab/samples2016apv.json Modified to only include background samples needed for LQ analysis, and single muon samples. Still need to add electron datasets. crab/samples2018.json Modified to only include background samples needed for LQ analysis, and single muon samples. Still need to add electron datasets.

(note that 2017 sample list has not been made yet) modules/LQProducer.py This module replaces HHProducer and will add many new branches (too many to list here!). The branches can be categorized as follows: Leading and sub-leading (trailing) muon kinematics (e.g., lead_muon_pt, trail_muon_pt), and Reco, Iso, ID, and HLT scale factors and their variations. NOTE: HLT SFs set to 1 currently! Will need to write a module to get these SFs in future. Leading and trailing electron kinematics Leading and trailing jet kinematics, b-tag scores and b-tag efficiency scale factors Several kinematic quantities, e.g., ST, M_uu, etc. DeltaR and DeltaPhi for different muon, jet, and electron combinations. MET kinematics and filters Selections: Muons must pass the high-pT ID and loose relative tracker-based isolation (Muons must use TuneP momentum!) Electrons must pass the HEEP ID Jets must pass the tight ID

A loose skim on muons with pT < 20 GeV is applied here as well modules/highPtMuonScaleResProducer.py This module replaces muonScaleResProducer and will add 12 branches to the ntuples: Muon_highPtScaleRoccor_pt Muon_highPtScaleRoccorUp_pt Muon_highPtScaleRoccorDown_pt Muon_highPtResSmeared_pt Muon_highPtResSmearedUp_pt Muon_highPtResSmearedDown_pt Muon_highPtResSmeared_eta Muon_highPtResSmearedUp_eta Muon_highPtResSmearedDown_eta Muon_highPtResSmeared_phi Muon_highPtResSmearedUp_phi Muon_highPtResSmearedDown_phi

The first three branches are the central, up, and down systematic variations of muon momenta corrected with Rochester corrections. It is important to note that these corrections are only applied to muons with pT < 200 GeV, and correspond to scale and resolution corrections.

The last eight branches are the central, up, and down, systematic variations of muon pT, eta, and phi after an additional MER smearing is applied. All three are modified, as the smearing is p dependant (not pT) and will therefore modify all three variables. This additional smearing is a specific prescription for high-pT muons.

As the LQ analysis uses TuneP momentum, we need to use TuneP momentum for all muons rather than PF momentum in this module.

NOTE: A systematic uncertainty on muon momentum scale for high pT muons with pT > 200 GeV using the Generalized Endpoint method is needed still

modules/muonRECOSFProducer.py This module will add two branches to the ntuples: Muon_Reco_LooseRelTkIsoandHighPtIDandIPCut_SF Muon_Reco_LooseRelTkIsoandHighPtIDandIPCut_SFerr These branches are the muon reconstruction SF and its error. As the LQ analysis uses the TuneP momentum measurement for muons, it was important to implement this in a new module (rather than using the default muon reconstruction SF module which uses PF momentum) modules/topPtReweightProducer.py This module will add three new branches to the ntuples: TopPtWeight TopPtWeight_up TopPtWeight_down These branches are central, up, and down systematic variations of the event weights that will correct for top pT mismodeling. The corrections are year-dependent, and may or may not be needed in the analysis (EXO recommended for EXO-21-019)

CAUTION: These corrections should only be applied to TOP MC samples, but I have not yet figured out how to do this--they are currently added to all samples. One could add these to all samples and just apply the weight to top samples in the HHCoffea code?

-- DavidMorse - 14 Jan 2015

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng BR_Sigma_MuMuRPV.png r1 manage 31.2 K 2014-08-31 - 01:40 DarinBaumgartel  
HTMLhtml CrossSectionLogVerbose_OfficialRequest.html r1 manage 236.0 K 2014-09-03 - 18:54 DarinBaumgartel  
HTMLhtml LQ2ScaleMatchUnc.html r2 r1 manage 77.6 K 2014-08-27 - 17:49 DarinBaumgartel  
Edit | Attach | Watch | Print version | History: r68 < r67 < r66 < r65 < r64 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r68 - 2023-10-05 - DavidMorse
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback