W+Jets Differential Cross Section Analysis
The Basics
- Calculate cross-section of W + jets process as a function of jet multiplicity, and jet pt and η.
- Also possibly: jet multiplicity ratios
- Also possibly: M(j_0...j_N) , Σ (MET, mu PT, jet PTs) , ∆η of objects , ∆φ of objects
- Using Full 2011 Dataset, starting with the muon analysis and the single-mu trigger.
- Extending to the electron analysis once machinery is in place.
- Data unfolding using singular value decomposition (SVD), implemented recently in ROOT as TSVD, with optimization of the regularization parameter (τ )
- Distributions compared with various generator predictions using RIVET
- Considering the standard assortment of systematic uncertainties
- Jet energy scale and resolution
- Lepton momentum scale and resolution
- Reco/ID/Isolation uncertainties
- Integrated luminosity
- Shape of backgrounds
The procedure
- Start with set of NTuples created from AOD using The Leptons+Jets group NTuple Facility
- Prepare a spreadsheet (comma-separated text file) which details the NTuples to run on, like 7TeV Example csv. Columns detail:
- SignalType: The ntuple names should start with this. i.e. all the TTJets ntuples start with "TTJets_TuneZ2_7TeV-madgraph". The tool will sort the files into these groups.
- XSections: The cross-sections. Usually using the best theory cross-sections from the best known standard-model cross-sections twiki, or the twiki for 8TeV
- N_orig: The number of events in the (MC) sample. Can be taken from the LJFilter histograms in the ntuples, or if you ran on the whole set successfully you can get these numbers from DAS. Just set to "1" for real data.
- Group: This is a group name. For eample, the WW, WZ, and ZZ could all belong to the group DiBoson, and they will be merged into a file "DiBoson.root" when the tool is done.
- CastorDirectory: The directory on castor (these days on EOS), starting with 'store'...
- Prepare a list of files on EOS, called 'masterdirlist.txt' in the current directory. This should store all files you will want to run on.
- Example: 7TeV mastedirlist.txt
- Beware of duplicates! If crab jobs gave duplicate files, either delete them or do not include the duplicates in masterdirlist.
- Skim ntuples storing only the final variables for measurement, and the necessary weights.
- The tool will ask you "Would you like to automatically submit and check jobs now?". If you are ready to run all jobs, choose "y". If you are testing your C code or something, choose "n".
- If you choose "n", you can run a single job locally. First, look for a job run command.
- If you choose "y", you need to wait for the jobs to finish. The individual root files will "hadd" into group files, and the results will show to screen.
- If you don't want to wait around for jobs to finish, use a "screen" on lxplus. Note the lxplus machine you are on (e.g. lxplus430), and open a screen by typing "screen". Launch the jobs with AnalyzerMakerFastLocal.py as described above. Type ctrl+A+D to exit. You can then log out, go home, etc, and log back into to lxplus430, and type "screen -r" to return to your screen.
Running unfolding jobs (needs updating)
Unfolding and the standard output
The standard output
- Running WJetsTreeAnalyzer.py will produce unfolded data distributions and a rescaling function capable of translating a reco distribution into the unfolded distribution for the calculation of cross-sections and plotting of the final distributions.
- The standard output is a set of 2X2 plots for each variable (NJets, lead jet PT, lead jet eta, etc..).
- Top Left: A standard histogram of reconstructed data vs reconstructed MC with the binning and axis range of the final measurement distribution.
- Bottom Left: The gen-vs-reco response matrix to be used in the unfolding. Grid lines are superimposed to indicate two regions. The central region is the region in which the unfolded result will be derived. The periphery is an underflow/overflow region. The reco is confined to the central region, whereas the gen spans the central region and periphery.
- Top right: The distribution of MC Reco and MC Gen, Data Reco and Data Unfolded in the central+periphery. The periphery is marked with dashed lines. In the central region, variable binning has been chosen to keep the MC Reco distribution flat, with equal statistics in each bin. This is optimal for the unfolding, and results in low values of the regularization parameter tau. Also draw on this plot is the result of the unfolding procedure conducted on the MC Reco, as a closure+sanity test for the unfolding procedure.
- Bottom right: The final distribution with the "presentation binning" for UnfoldedData/RecoData. Superimposed with the GenMC/RecoMC, which should be roughly similar. Also including the UnfoldedMC/GenMC with a dashed line at unity, to better show the results of the sanity check on the unfolding procedure.
Unfolding method
- Using TSVDUnfold as implemented in ROOT
- Regularized with an optimized parameter tau. Tau is an integer between 1 and the number of bins N used in the NxN response matrix.
- Tau is optimized with the following procedure:
- For each variable being studied, create a histogram of the MC Reco with proper normalization.
- Generated M random pseudo-data histograms based on the MC Reco distribution. Each pseudo-data histogram is constructed like this:
- Assign a random "shift" of the distribution, S, between -5% and 5%.
- Loop over bins in the MC Reco Distribution. Round the content of each bin to the nearest integer and get the bin center, such that you have E events centered at x. E times, create a new value x' smeared by a 5% gaussian and shifted with the global shift S. Fill x' into the output histogram.
- For each pseudo-data histogram, conduct the unfolding for each possible value of tau. You will then have M unfolded pseudo-data histograms. Compare these to the MC Gen histogram. Whichever tau has the best average agreement should be the optimal tau.
- The agreement can be quantified in terms of sum of absolute differences, sum of square differences, or a KS test. We find good results with a simple sum of absolute differences.
Generator Level Distributions with RIVET
Setting up Rivet for CMSSW.
- You just need a fresh CMSSW release and a few packages.
mkdir RIVET_WJets
cd RIVET_WJets
cmsrel CMSSW_5_0_0
cd CMSSW_5_0_0/src/
cmsenv
cvs co -r V01-01-41 Configuration/GenProduction/python/WToMuNu_TuneZ2_7TeV_pythia6_cff.py
cvs co Configuration/GenProduction/python/rivet_customize.py
cvs co GeneratorInterface/RivetInterface
scramv1 b -j 16
Setting up a "Rivet Analysis"
- A Rivet Analysis is basically a few pieces of code which stipulate the process you are looking at, the event/kinematic selections, the variables, the binning, etc. We don't have an official rivet analysis yet, but one can be patched together like this:
cvs co -d WJetsAnalysis UserCode/baumgartel/LQ/NTupleAnalyzerV2/WJetsAnalysis/
cp WJetsAnalysis/CMS_WJets_TEST.cc GeneratorInterface/RivetInterface/src/CMS_WJets_TEST.cc
cp WJetsAnalysis/CMS_WJets_TEST.plot GeneratorInterface/RivetInterface/data/CMS_WJets_TEST.plot
cp WJetsAnalysis/CMS_WJets_TEST.info GeneratorInterface/RivetInterface/data/CMS_WJets_TEST.info
cp WJetsAnalysis/CMS_WJets_TEST.aida GeneratorInterface/RivetInterface/data/CMS_WJets_TEST.aida
scramv1 b -j 16
cd WJetsAnalysis
Methods of running
- Rivet can be run from a generator configuration (Running GEN on the fly), using a generator configuration from one of the centrally produced MC samples. This is an example with Pythia - or just use the WJetsAnalysis/Rivet_WJets_GenOnFly.py script which you should have checked out in the above script. It is already suited for running with our Rivet analysis.
cmsDriver.py Configuration/GenProduction/python/WToMuNu_TuneZ2_7TeV_pythia6_cff.py -s GEN \
--datatier=GEN-SIM-RAW --conditions auto:mc --eventcontent RAWSIM --no_exec -n 10000 \
--python_filename=Rivet_WJets_GenOnFly_Template.py --customise=Configuration/GenProduction/rivet_customize.py
- Alternatively, you can run from a GEN sample.
- An example running on a GEN file is:
cmsRun Rivet_WJets_FromCentralGen.py
- Or you can use CRAB, e.g. with WJetsAnalysis/crab.cfg and WJetsAnalysis/Rivet_WJets_FromCentralGen.py:
source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.csh
source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.csh
crab -create
crab -submit
crab -status # Wait for output to be done...
crab -getoutput
- When running is done, you need to locate the aida file. If you ran with just cmsRun, it is in your current directory. If you ran with crab, it is in the crab output.
- You can make plots from the .aida file straight to pdf:
rivet-mkhtml output.aida
- You can make plots from the .aida file into a root file populated with TGraphs:
aida2root output.aida
- You can also merge the final results with crab using aidamerge for the aida files and hadd for the root files. This process is automated with a script to make sure compatible versions of CMSSW are used with Rivet. The usage is:
python MergeAidaCrabDir.py [crab_directory] [mc_name_used_for_labeling]
or for e.g. MadGraph
python MergeAidaCrabDir.py crab_0_120924_180819 MadGraph
Generator Level Distributions with Blackhat
Updated method from Git with files on EOS
Checkout and Setup
Setup the working area using sh (not csh or tcsh!)
sh
source /afs/cern.ch/cms/cmsset_default.sh
scram project -n CMSSW_5_3_8_Blackhat CMSSW CMSSW_5_3_8
cd CMSSW_5_3_8_Blackhat/src/
cmsenv
Check out the blackhat code, build fastjet, and make a symbolic link for file lists dir
git clone git@github.com:darinbaumgartel/WJets7TeV.git WJetsAnalysis
cd WJetsAnalysis/BlackHatHistogramMaker
cvs co -d fastjet-2.4.4-install UserCode/baumgartel/WPlusJets/7TeV/BlackHat/fastjet-2.4.4-install
cd blackhat_hists
cp -r ../blackhat_lists lists
cd ../
Modify a couple scripts for correct directory names
find . | xargs grep CURRENTDIR | grep ":" | cut -d":" -f1 | sort | uniq | xargs sed -i 's#CURRENTDIR#'`pwd`'#g'
Setup and Recompile
source setup.sh
cd blackhat_hists
make clean && make
Notes on Modifications
- Physics and binning can be modified in blackhat_hists/makeHistograms.cpp
- EOS directories can be modified in blackhat_hists/runList_batch.sh
- EOS Transfer from Castor can be done with the tool EOSTransfer/CastorToEOS.py
Running (In Progress)
If you made changes to the physics code:
make
Submit all of the jobs to the lxbatch system
cd blackhat_hists
./submit_all.sh
This takes something like 12-24 hours for all variations. Check the
jobs status with "bjobs" command. The histograms will show up in the
directory ./hists/.
Clean up the individual job scipts and LSF* directories:
mkdir batch
mv job_*sh LSFJOB_* batch
Move the histograms for different PDF/scales to their own directories, and combine them into the appropriate 1/2/3/4 jet root files, gathered by scale/pdf/errorset folder.
source setup.sh
python HistoGather.py
The final combined histograms:
ls MergedHistos/hists*
How to set up an area to use Blackhat nTupleReader (deprecated)
This is how to set up a new area with the external libraries needed for
running over the Blackhat nTuples.
In these instructions, the new area is
/afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_Blackhat/src/
, so you will need to change this to whatever you want. The old area where the original libraries were compiled is /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/ and in these instructions, the libraries are just copied to the new area. That should work if you keep the same version of CMSSW (5_3_8). If you want to change the version of CMSSW, then you might have to get the tarballs for the nTupleReader and fastjet and recompile them from source.
# Create CMSSW area:
export SCRAM_ARCH=slc5_amd64_gcc462
scram project -n CMSSW_5_3_8_Blackhat CMSSW CMSSW_5_3_8
cd CMSSW_5_3_8_Blackhat/src/
cmsenv
# Copy over libs:
cp -a /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/ntuplesreader-1.0-install .
cp -a /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/fastjet-2.4.4-install .
ln -s /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/blackhat_lists .
# Make setup script:
cp -a /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/setup.sh .
sed 's#/afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/#/afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_Blackhat/src/#g' setup.sh > tmp
mv tmp setup.sh
source setup.sh
# Set up blackhat code:
mkdir blackhat_hists11
cd blackhat_hists11
cp -a /afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/blackhat_hists11/*{.sh,.h,.cpp,Makefile,.perl} .
ln -s ../blackhat_lists lists
# Edit Makefile for the new area:
sed 's#/afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_LQ/src/#/afs/cern.ch/user/j/jhaley/work/CMSSW_5_3_8_Blackhat/src/#g' Makefile > tmp
mv tmp Makefile
# Recompile
make clean && make
Read on for how to actually make the blackhat+sherpa histograms.
Instructions for making histograms from the Blackhat nTuples
Once all the external libraries are set up, here is what I do to make a new set of histograms.
1) Get the histogram making code from a previous area and compile it (description of each file at the end):
cd ~jhaley/work/CMSSW_5_3_8_Blackhat/src/
source setup.sh
mkdir blackhat_hists12
cd blackhat_hists12
cp ../blackhat_hists11/*{.sh,.h,.cpp,Makefile,.perl} .
ln -s ../blackhat_lists lists
make
2) Make any changes to the code and recompile:
For example change the cone radius for rejecting jets close to muons by modifying makeHistograms.cpp so that min_dR_jet_clep is the new value. Then recompile...
make
3) Submit all of the jobs to the lxbatch system
./submit_all.sh
This takes something like 12-24 hours for all variations. Check the
jobs status with "bjobs" command. The histograms will show up in the
directory ./hists/.
4) Some clean up and organization after all jobs are done:
4a) I clean up the individual job scipts and LSF* directories
mkdir batch
mv job_*sh LSFJOB_* batch
4b) Move the histograms for different PDF/scales to their own directories:
./mv_hists.sh
4c) Check that all the root files for each PDF/scale is there:
I just do the stupid manual check of opening all of the files for each
directory with
ROOT and see that none are missing keys and there is the
same number (112) for each.
root hists_CT10_r1.0_f1.0/*root
root hists_cteq6m_r1.0_f1.0/*root
root hists_MSTW2008nlo68cl_r1.0_f1.0/*root
root hists_NNPDF21_100_r1.0_f1.0/*root
root hists_CT10_r0.5_f0.5/*root
root hists_CT10_r2.0_f2.0/*root
5) Make the combined histograms:
./combine_all.sh 0
./combine_all.sh 1
./combine_all.sh 2
./combine_all.sh 3
./combine_all.sh 0 0.5 0.5
./combine_all.sh 0 2.0 2.0
6) The final combined histograms:
ls -tlr hists_*/*all.root
Description of source code and scripts
setup.sh:
Source this to set up the environment.
makeHistograms.cpp
(-->
makeHistograms.exe
):
Program for making histograms from the BH ntuples.
You should only run this program over one "part" of the Blackhat
samples at a time (born, loop, real, or vsub). After you have make
histograms for each part, you should combine the histograms from
the separate parts using the combineHistograms program to get
meaningful final histograms. You can separate each part into as
many pieces as you like, you just need to make sure that:
1) You never run makeHistograms on more than one part at a time;
2) You use the final binning from the beginning if you want
meaningful statistical uncertianties.
combineHistograms.cpp
(-->
combineHistograms.exe
):
Program to combine the histograms made by the makeHistograms
program.
This program will take any number of input histograms files and
combined them to produce the a final set of histograms. However,
if you want meaningful results, you must:
1) Combine only files from the same process/multiplicity;
2) Have at least one histogram file for each of the four parts
(born, loop, real, or vsub); (Unless you only have/want born.)
3) Already have the final binning from when the histograms were
created in the makeHistograms program.
BH_hist.cpp/h
:
A wrapper class for the histograms to help with the special
treatment required for the statistical uncertainties. In addition
to the main histogram, it uses a separate histogram to keep track
of the sum of the sqares of the "true" weights for determining the
correct final uncertainty on each bin. It is because of thise
special treatment that the histograms must have the final binning
from the beginning.
Particle.cpp/h
:
A helper class to keep the PDGID with the 4-vector for objects.
submit_all.sh
:
A simple wrapper script to call submit_pdf.sh and submit_scales.sh
for each PDF and scale we use.
submit_pdf.sh
:
A simple wrapper script for starting jobs on lxbatch (via
submitBatch.perl) that will run the script ./runList_batch.sh for a
given PDF.
submit_scales.sh
:
A simple wrapper script for starting jobs on lxbatch (via
submitBatch.perl) that will run the script ./runList_batch.sh for a
given scale variation.
submitBatch.perl
:
A rather general purpose program for submitting a job to the
lxbatch system that will run a given command (with arguments). This
program will construct a small script that basically tells the batch
job to cd back to the current directory and run the command that was
passed.
runList_batch.sh
:
This is the script that actually executes the makeHistogram.exe
program for each list of blackhat+sherpa ntuples. The ntuples are
stored at neu.cern.ch:/mnt/3TB/jhaley/Blackhat_ntuples/, which is not
mounted on the lxbatch nodes, so the files are first scp'ed to the
local node and then run over.
mv_hists.sh
:
Move the histograms for different PDF/scales to their own directories.
combine_all.sh
:
A simple wrapper script to run the combineHistograms.exe program for all
W+njet sub-samples in a directory for a given PDF/scale choice.
Running Sherpa from Central Sherpacks
cmsrel CMSSW_6_0_1
cd CMSSW_6_0_1/src
cmsenv
cvs co -r V00-05-00 GeneratorInterface/SherpaInterface
cp /afs/cern.ch/cms/generators/www/slc5_amd64_gcc462/sherpa/1.4.0/7TeV/sherpa_7TeV_Wleptonic_0j4incl_2mlnu7000_MASTER.tgz .
cp -r /afs/cern.ch/cms/generators/www/slc5_amd64_gcc462/sherpa/1.4.0/7TeV/python .
mkdir Configuration
mkdir Configuration/SherpaWork
mv python Configuration/SherpaWork/
scramv1 b -j 8
cmsDriver.py Configuration/SherpaWork/sherpa_7TeV_Wleptonic_0j4incl_2mlnu7000_MASTER_cff.py -s GEN -n 100 --no_exec --conditions auto:mc --eventcontent RAWSIM
sed -i 's/maxEventsToPrint = cms.int32(0)/maxEventsToPrint = cms.untracked.int32(0)/g' sherpa_7TeV_Wleptonic_0j4incl_2mlnu7000_MASTER_cff_py_GEN.py
cmsRun sherpa_7TeV_Wleptonic_0j4incl_2mlnu7000_MASTER_cff_py_GEN.py
--
DarinBaumgartel - 09-Oct-2013