W+Jets Differential Cross Section Analysis

The Basics

  • Calculate cross-section of W + jets process as a function of jet multiplicity, and jet pt and η.
    • Also possibly: jet multiplicity ratios
    • Also possibly: M(j_0...j_N) , Σ (MET, mu PT, jet PTs) , ∆η of objects , ∆φ of objects
  • Using Full 2011 Dataset, starting with the muon analysis and the single-mu trigger.
    • Extending to the electron analysis once machinery is in place.
  • Data unfolding using singular value decomposition (SVD), implemented recently in ROOT as TSVD, with optimization of the regularization parameter (τ )
  • Distributions compared with various generator predictions using RIVET
  • Considering the standard assortment of systematic uncertainties
    • Jet energy scale and resolution
    • Lepton momentum scale and resolution
    • Reco/ID/Isolation uncertainties
    • Integrated luminosity
    • Shape of backgrounds

The procedure

TreeMaker overview

  • Start with set of NTuples created from AOD using The Leptons+Jets group NTuple Facility
  • Prepare a spreadsheet (comma-separated text file) which details the NTuples to run on, like 7TeV Example csv. Columns detail:
    • SignalType: The ntuple names should start with this. i.e. all the TTJets ntuples start with "TTJets_TuneZ2_7TeV-madgraph". The tool will sort the files into these groups.
    • XSections: The cross-sections. Usually using the best theory cross-sections from the best known standard-model cross-sections twiki, or the twiki for 8TeV
    • N_orig: The number of events in the (MC) sample. Can be taken from the LJFilter histograms in the ntuples, or if you ran on the whole set successfully you can get these numbers from DAS. Just set to "1" for real data.
    • Group: This is a group name. For eample, the WW, WZ, and ZZ could all belong to the group DiBoson, and they will be merged into a file "DiBoson.root" when the tool is done.
    • CastorDirectory: The directory on castor (these days on EOS), starting with 'store'...
  • Prepare a list of files on EOS, called 'masterdirlist.txt' in the current directory. This should store all files you will want to run on.
    • Example: 7TeV mastedirlist.txt
    • Beware of duplicates! If crab jobs gave duplicate files, either delete them or do not include the duplicates in masterdirlist.
  • Skim ntuples storing only the final variables for measurement, and the necessary weights.
    • Example of the analysis code
    • How to run the analysis code in batch with my automated facility
              cmsrel CMSSW_4_2_8
              cd CMSSW_4_2_8/src
              cmsenv
              cvs co -d NTupleAnalyzerV2 UserCode/baumgartel/LQ/NTupleAnalyzerV2
              cd NTupleAnalyzerV2/WJetsAnalysis
              python AnalyzerMakerFastLocal.py -c NTupleAnalyzer_V00_02_06_WPlusJets.C -h NTupleAnalyzer_V00_02_06_WPlusJets.h -i NTupleInfo2011_V00_02_06_WPlusJets_withsys_EOS.csv -t MyFirstAnalyzerTest
              
  • The tool will ask you "Would you like to automatically submit and check jobs now?". If you are ready to run all jobs, choose "y". If you are testing your C code or something, choose "n".
    • If you choose "n", you can run a single job locally. First, look for a job run command.
      • List all the RootProcesses which are just files which have commands to run the C file.
      • ls -1 RootProcesses_*
      • Choose one and look at it, run it, etc.
      • cat RootProcesses_WToLNu_part2of135
      • root -b RootProcesses_WToLNu_part2of135
    • If you choose "y", you need to wait for the jobs to finish. The individual root files will "hadd" into group files, and the results will show to screen.
      • If you don't want to wait around for jobs to finish, use a "screen" on lxplus. Note the lxplus machine you are on (e.g. lxplus430), and open a screen by typing "screen". Launch the jobs with AnalyzerMakerFastLocal.py as described above. Type ctrl+A+D to exit. You can then log out, go home, etc, and log back into to lxplus430, and type "screen -r" to return to your screen.

Running unfolding jobs (needs updating)

  • A pyroot facility called WJetsTreeAnalyzer.py will take care of the unfolding, distribution creation, cross section calculations, and systematics.
    • Need to update root to run with TSVDUnfold. On lxplus:
              cd  NTupleAnalyzerV2/WJetsAnalysis
              source /afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/setup.csh
              source /afs/cern.ch/sw/lcg/app/releases/ROOT/5.32.02/x86_64-slc5-gcc43-opt/root/bin/thisroot.csh 
              python WJetsTreeAnalyzer.py
              

Unfolding and the standard output

The standard output

  • Running WJetsTreeAnalyzer.py will produce unfolded data distributions and a rescaling function capable of translating a reco distribution into the unfolded distribution for the calculation of cross-sections and plotting of the final distributions.
  • The standard output is a set of 2X2 plots for each variable (NJets, lead jet PT, lead jet eta, etc..).
    • Top Left: A standard histogram of reconstructed data vs reconstructed MC with the binning and axis range of the final measurement distribution.
    • Bottom Left: The gen-vs-reco response matrix to be used in the unfolding. Grid lines are superimposed to indicate two regions. The central region is the region in which the unfolded result will be derived. The periphery is an underflow/overflow region. The reco is confined to the central region, whereas the gen spans the central region and periphery.
    • Top right: The distribution of MC Reco and MC Gen, Data Reco and Data Unfolded in the central+periphery. The periphery is marked with dashed lines. In the central region, variable binning has been chosen to keep the MC Reco distribution flat, with equal statistics in each bin. This is optimal for the unfolding, and results in low values of the regularization parameter tau. Also draw on this plot is the result of the unfolding procedure conducted on the MC Reco, as a closure+sanity test for the unfolding procedure.
    • Bottom right: The final distribution with the "presentation binning" for UnfoldedData/RecoData. Superimposed with the GenMC/RecoMC, which should be roughly similar. Also including the UnfoldedMC/GenMC with a dashed line at unity, to better show the results of the sanity check on the unfolding procedure.

Unfolding method

  • Using TSVDUnfold as implemented in ROOT
  • Regularized with an optimized parameter tau. Tau is an integer between 1 and the number of bins N used in the NxN response matrix.
  • Tau is optimized with the following procedure:
    • For each variable being studied, create a histogram of the MC Reco with proper normalization.
    • Generated M random pseudo-data histograms based on the MC Reco distribution. Each pseudo-data histogram is constructed like this:
      • Assign a random "shift" of the distribution, S, between -5% and 5%.
      • Loop over bins in the MC Reco Distribution. Round the content of each bin to the nearest integer and get the bin center, such that you have E events centered at x. E times, create a new value x' smeared by a 5% gaussian and shifted with the global shift S. Fill x' into the output histogram.
    • For each pseudo-data histogram, conduct the unfolding for each possible value of tau. You will then have M unfolded pseudo-data histograms. Compare these to the MC Gen histogram. Whichever tau has the best average agreement should be the optimal tau.
      • The agreement can be quantified in terms of sum of absolute differences, sum of square differences, or a KS test. We find good results with a simple sum of absolute differences.

Generator Level Distributions with RIVET

Setting up Rivet for CMSSW.

  • You just need a fresh CMSSW release and a few packages.

mkdir RIVET_WJets
cd RIVET_WJets
cmsrel CMSSW_5_0_0
cd CMSSW_5_0_0/src/
cmsenv 

cvs co -r V01-01-41 Configuration/GenProduction/python/WToMuNu_TuneZ2_7TeV_pythia6_cff.py
cvs co Configuration/GenProduction/python/rivet_customize.py 
cvs co GeneratorInterface/RivetInterface 
scramv1 b -j 16

Setting up a "Rivet Analysis"

  • A Rivet Analysis is basically a few pieces of code which stipulate the process you are looking at, the event/kinematic selections, the variables, the binning, etc. We don't have an official rivet analysis yet, but one can be patched together like this:


cvs co -d WJetsAnalysis UserCode/baumgartel/LQ/NTupleAnalyzerV2/WJetsAnalysis/
cp WJetsAnalysis/CMS_WJets_TEST.cc GeneratorInterface/RivetInterface/src/CMS_WJets_TEST.cc
cp WJetsAnalysis/CMS_WJets_TEST.plot GeneratorInterface/RivetInterface/data/CMS_WJets_TEST.plot
cp WJetsAnalysis/CMS_WJets_TEST.info GeneratorInterface/RivetInterface/data/CMS_WJets_TEST.info
cp WJetsAnalysis/CMS_WJets_TEST.aida GeneratorInterface/RivetInterface/data/CMS_WJets_TEST.aida
scramv1 b -j 16
cd WJetsAnalysis

Methods of running

  • Rivet can be run from a generator configuration (Running GEN on the fly), using a generator configuration from one of the centrally produced MC samples. This is an example with Pythia - or just use the WJetsAnalysis/Rivet_WJets_GenOnFly.py script which you should have checked out in the above script. It is already suited for running with our Rivet analysis.
        cmsDriver.py Configuration/GenProduction/python/WToMuNu_TuneZ2_7TeV_pythia6_cff.py -s GEN \
        --datatier=GEN-SIM-RAW --conditions auto:mc --eventcontent RAWSIM --no_exec -n 10000 \
        --python_filename=Rivet_WJets_GenOnFly_Template.py --customise=Configuration/GenProduction/rivet_customize.py 
        
    • Then you could run this, generating events on the fly -
       
              cmsRun Rivet_WJets_GenOnFly.py
              
  • Alternatively, you can run from a GEN sample.
    • An example running on a GEN file is:
       
             cmsRun Rivet_WJets_FromCentralGen.py
            
    • Or you can use CRAB, e.g. with WJetsAnalysis/crab.cfg and WJetsAnalysis/Rivet_WJets_FromCentralGen.py:
       
             source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.csh 
             source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.csh
             crab -create
             crab -submit
             crab -status     # Wait for output to be done...
             crab -getoutput
            
  • When running is done, you need to locate the aida file. If you ran with just cmsRun, it is in your current directory. If you ran with crab, it is in the crab output.
    • You can make plots from the .aida file straight to pdf:
             rivet-mkhtml output.aida
             
    • You can make plots from the .aida file into a root file populated with TGraphs:
             aida2root output.aida
             
  • You can also merge the final results with crab using aidamerge for the aida files and hadd for the root files. This process is automated with a script to make sure compatible versions of CMSSW are used with Rivet. The usage is:
           python MergeAidaCrabDir.py [crab_directory] [mc_name_used_for_labeling]
           
    or for e.g. MadGraph
            python     MergeAidaCrabDir.py     crab_0_120924_180819     MadGraph
           

Generator Level Distributions with Blackhat

Updated method from Git with files on EOS

Checkout and Setup

Setup the working area using sh (not csh or tcsh!)

sh
source /afs/cern.ch/cms/cmsset_default.sh
scram project -n CMSSW_5_3_8_Blackhat CMSSW CMSSW_5_3_8
cd CMSSW_5_3_8_Blackhat/src/
cmsenv

Check out the blackhat code, build fastjet, and make a symbolic link for file lists dir

git clone git@github.com:darinbaumgartel/WJets7TeV.git WJetsAnalysis
cd WJetsAnalysis/BlackHatHistogramMaker
cvs co -d fastjet-2.4.4-install UserCode/baumgartel/WPlusJets/7TeV/BlackHat/fastjet-2.4.4-install

cd blackhat_hists
cp -r  ../blackhat_lists lists
cd ../

Modify a couple scripts for correct directory names

find . | xargs grep CURRENTDIR | grep ":" | cut -d":" -f1 | sort | uniq | xargs sed -i 's#CURRENTDIR#'`pwd`'#g' 

Setup and Recompile

source setup.sh
cd blackhat_hists
make clean && make

Notes on Modifications

  • Physics and binning can be modified in blackhat_hists/makeHistograms.cpp
  • EOS directories can be modified in blackhat_hists/runList_batch.sh
  • EOS Transfer from Castor can be done with the tool EOSTransfer/CastorToEOS.py

Running (In Progress)

If you made changes to the physics code:

make

Submit all of the jobs to the lxbatch system

cd blackhat_hists
./submit_all.sh

This takes something like 12-24 hours for all variations. Check the jobs status with "bjobs" command. The histograms will show up in the directory ./hists/.

Clean up the individual job scipts and LSF* directories:

mkdir batch 
mv job_*sh LSFJOB_* batch

Move the histograms for different PDF/scales to their own directories, and combine them into the appropriate 1/2/3/4 jet root files, gathered by scale/pdf/errorset folder.

source setup.sh
python HistoGather.py

The final combined histograms:

ls MergedHistos/hists*

How to set up an area to use Blackhat nTupleReader (deprecated)

This is how to set up a new area with the external libraries needed for running over the Blackhat nTuples.

In these instructions, the new area is /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_Blackhat/src/, so you will need to change this to whatever you want. The old area where the original libraries were compiled is /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/ and in these instructions, the libraries are just copied to the new area. That should work if you keep the same version of CMSSW (5_3_8). If you want to change the version of CMSSW, then you might have to get the tarballs for the nTupleReader and fastjet and recompile them from source.

# Create CMSSW area:

  export SCRAM_ARCH=slc5_amd64_gcc462
  scram project -n CMSSW_5_3_8_Blackhat CMSSW CMSSW_5_3_8
  cd CMSSW_5_3_8_Blackhat/src/
  cmsenv

# Copy over libs:

  cp -a /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/ntuplesreader-1.0-install .
  cp -a /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/fastjet-2.4.4-install .
  ln -s /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/blackhat_lists .

# Make setup script:

  cp -a /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/setup.sh .
  sed 's#/afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/#/afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_Blackhat/src/#g' setup.sh > tmp
  mv tmp setup.sh
  source setup.sh

# Set up blackhat code:

  mkdir blackhat_hists11
  cd blackhat_hists11
  cp -a /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/blackhat_hists11/*{.sh,.h,.cpp,Makefile,.perl} .
  ln -s ../blackhat_lists lists

# Edit Makefile for the new area:

  sed 's#/afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/#/afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_Blackhat/src/#g' Makefile > tmp
  mv tmp Makefile

# Recompile

  make clean && make

Read on for how to actually make the blackhat+sherpa histograms.

Instructions for making histograms from the Blackhat nTuples

Once all the external libraries are set up, here is what I do to make a new set of histograms.

1) Get the histogram making code from a previous area and compile it (description of each file at the end):

  cd ~jhaley/work/CMSSW_5_3_8_Blackhat/src/
  source setup.sh
  mkdir blackhat_hists12
  cd blackhat_hists12
  cp ../blackhat_hists11/*{.sh,.h,.cpp,Makefile,.perl} .
  ln -s ../blackhat_lists lists
  make

2) Make any changes to the code and recompile:

For example change the cone radius for rejecting jets close to muons by modifying makeHistograms.cpp so that min_dR_jet_clep is the new value. Then recompile...

  make

3) Submit all of the jobs to the lxbatch system

  ./submit_all.sh

This takes something like 12-24 hours for all variations. Check the jobs status with "bjobs" command. The histograms will show up in the directory ./hists/.

4) Some clean up and organization after all jobs are done:

4a) I clean up the individual job scipts and LSF* directories

  mkdir batch 
  mv job_*sh LSFJOB_* batch

4b) Move the histograms for different PDF/scales to their own directories:

  ./mv_hists.sh

4c) Check that all the root files for each PDF/scale is there: I just do the stupid manual check of opening all of the files for each directory with ROOT and see that none are missing keys and there is the same number (112) for each.

  root hists_CT10_r1.0_f1.0/*root
  root hists_cteq6m_r1.0_f1.0/*root
  root hists_MSTW2008nlo68cl_r1.0_f1.0/*root
  root hists_NNPDF21_100_r1.0_f1.0/*root 
  root hists_CT10_r0.5_f0.5/*root
  root hists_CT10_r2.0_f2.0/*root

5) Make the combined histograms:

  ./combine_all.sh 0
  ./combine_all.sh 1
  ./combine_all.sh 2
  ./combine_all.sh 3
  ./combine_all.sh 0 0.5 0.5
  ./combine_all.sh 0 2.0 2.0

6) The final combined histograms:

  ls -tlr hists_*/*all.root

Description of source code and scripts

setup.sh: Source this to set up the environment.

makeHistograms.cpp (--> makeHistograms.exe): Program for making histograms from the BH ntuples. You should only run this program over one "part" of the Blackhat samples at a time (born, loop, real, or vsub). After you have make histograms for each part, you should combine the histograms from the separate parts using the combineHistograms program to get meaningful final histograms. You can separate each part into as many pieces as you like, you just need to make sure that: 1) You never run makeHistograms on more than one part at a time; 2) You use the final binning from the beginning if you want meaningful statistical uncertianties.

combineHistograms.cpp (--> combineHistograms.exe): Program to combine the histograms made by the makeHistograms program. This program will take any number of input histograms files and combined them to produce the a final set of histograms. However, if you want meaningful results, you must: 1) Combine only files from the same process/multiplicity; 2) Have at least one histogram file for each of the four parts (born, loop, real, or vsub); (Unless you only have/want born.) 3) Already have the final binning from when the histograms were created in the makeHistograms program.

BH_hist.cpp/h: A wrapper class for the histograms to help with the special treatment required for the statistical uncertainties. In addition to the main histogram, it uses a separate histogram to keep track of the sum of the sqares of the "true" weights for determining the correct final uncertainty on each bin. It is because of thise special treatment that the histograms must have the final binning from the beginning.

Particle.cpp/h: A helper class to keep the PDGID with the 4-vector for objects.

submit_all.sh: A simple wrapper script to call submit_pdf.sh and submit_scales.sh for each PDF and scale we use.

submit_pdf.sh: A simple wrapper script for starting jobs on lxbatch (via submitBatch.perl) that will run the script ./runList_batch.sh for a given PDF.

submit_scales.sh: A simple wrapper script for starting jobs on lxbatch (via submitBatch.perl) that will run the script ./runList_batch.sh for a given scale variation.

submitBatch.perl: A rather general purpose program for submitting a job to the lxbatch system that will run a given command (with arguments). This program will construct a small script that basically tells the batch job to cd back to the current directory and run the command that was passed.

runList_batch.sh: This is the script that actually executes the makeHistogram.exe program for each list of blackhat+sherpa ntuples. The ntuples are stored at neu.cern.ch:/mnt/3TB/jhaley/Blackhat_ntuples/, which is not mounted on the lxbatch nodes, so the files are first scp'ed to the local node and then run over.

mv_hists.sh: Move the histograms for different PDF/scales to their own directories.

combine_all.sh: A simple wrapper script to run the combineHistograms.exe program for all W+njet sub-samples in a directory for a given PDF/scale choice.

Running Sherpa from Central Sherpacks


cmsrel CMSSW_6_0_1
cd CMSSW_6_0_1/src
cmsenv
cvs co -r V00-05-00 GeneratorInterface/SherpaInterface
cp /afs/cern.ch/cms/generators/www/slc5_amd64_gcc462/sherpa/1.4.0/7TeV/sherpa_7TeV_Wleptonic_0j4incl_2mlnu7000_MASTER.tgz .
cp -r /afs/cern.ch/cms/generators/www/slc5_amd64_gcc462/sherpa/1.4.0/7TeV/python .
mkdir Configuration
mkdir Configuration/SherpaWork
mv python Configuration/SherpaWork/
scramv1 b -j 8
cmsDriver.py Configuration/SherpaWork/sherpa_7TeV_Wleptonic_0j4incl_2mlnu7000_MASTER_cff.py -s GEN -n 100 --no_exec --conditions auto:mc --eventcontent RAWSIM
sed -i 's/maxEventsToPrint = cms.int32(0)/maxEventsToPrint = cms.untracked.int32(0)/g' sherpa_7TeV_Wleptonic_0j4incl_2mlnu7000_MASTER_cff_py_GEN.py
cmsRun sherpa_7TeV_Wleptonic_0j4incl_2mlnu7000_MASTER_cff_py_GEN.py

-- DarinBaumgartel - 09-Oct-2013

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Eta_pfjet1_standard.png r1 manage 31.7 K 2012-06-29 - 21:30 DarinBaumgartel  
PNGpng Eta_pfjet2_standard.png r1 manage 32.1 K 2012-06-29 - 21:30 DarinBaumgartel  
PNGpng MT_muon1MET_standard.png r1 manage 37.6 K 2012-06-29 - 21:30 DarinBaumgartel  
PNGpng PFJet40Count_standard.png r1 manage 30.7 K 2012-06-29 - 21:30 DarinBaumgartel  
PNGpng Pt_pfjet1_standard.png r1 manage 36.5 K 2012-06-29 - 21:30 DarinBaumgartel  
PNGpng Pt_pfjet2_standard.png r1 manage 38.4 K 2012-06-29 - 21:30 DarinBaumgartel  
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2013-11-06 - DarinBaumgartel
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback