2012 CMSDAS Photon HLT Short Exercise
The purpose of this exercise is to familiarize you a bit with the HLT photon triggers, how they shape the data
we analyze and how those effects manifest in the reconstructed photon objects we use offline.
To do this we will employ some simple tools -- a small analyzer to pull out photon and trigger information from
the CMS data and drop them into a simple root tree, and a few macros to make histograms and manipulate them
a bit. In general the analysis philosophy here is sort of a "grab & go" approach: To get what data we need and
detach ourselves from CMSSW as soon as possible, allowing
more easy work on our laptop/desktop, whatever we may have
ROOT installed on. For this exercise you don't
particularly need to even know what CMSSW is, but some familiarity with root will be helpful -- we'll try not to do
anything you won't find
here...
We're going to attempt two tasks in this exercise:
- Try to find at which offline transverse momentum a particular single photon trigger becomes fully efficient.
- Look a bit at offline effects of introducing isolation requirements on a trigger
Lets get at it then:
The "Grab & Go" Part
This part is somewhat optional, since it takes a bit of time and we've done this part for you. You should familiarize
yourself with the process though since in real life we will not be doing this for you.
kserver_init (respond to prompt for CERN username and passwd)
scramv1 project CMSSW CMSSW_4_2_8
cd CMSSW_4_2_8/src
cvs co -d CMSDASPhoton/CMSDASTreeMaker UserCode/DMason/CMSDASTreeMaker
scramv1 b
|
This pulls down the analyzer that produced the root trees we will use here, an example crab config
and some macro code. The analyzer is fairly minimalist -- there certainly better and more sophisticated
ways to do this -- you will find many examples of different variations on this theme throughout the code
for the different exercises in CMSDAS. For a nice example of an analyzer used to make ntuples used in
photon SUSY analyses, take a look at Dongwook Jang's
SUSYNtuplizer -- there's a bit of a learning curve
to use this guy though, so we're starting simple here.
Brief Walkthrough of Analyzer Code
Wander into the
CMSDASPhoton/CMSDASTreeMaker directory that you pulled down from CVS. You'll see several directories:
<cmslpc03.fnal.gov> ls -1
BuildFile.xml
cmsdastreemaker_cfg.py
CVS
doc
interface
macro
python
src
test
|
interface houses the .h files for this guy -- there are two, one general one for the analyzer (
CMSDASTreeMaker.h), then another (
CMSDASTreeMakerBranchVars.h) which defines the the branches in the tree thats
created. Its good to remember where that one is as a reference for what things are called when you're working with the ntuples. Photons, Vertices,
PfCandidates are collections of vectors of the various associated quanties for those objects.
src houses the actual analyzer code that makes the tree.
By the way -- you're already seeing the words "tree" and "ntuple" used interchangeably. Get used to that.
Lets walk through the analyzer .C code in src -- there's some features there worth understanding a bit. There is the constructor,
CMSDASTreeMaker::CMSDASTreeMaker(const edm::ParameterSet& iConfig) where all the input parameters from the config are defined, after that the destructor, which isn't really used here, then a
beginJob() and
beginRun() method -- these are useful for initialization -- and you see in the
beginRun method (which as you might guess is executed whenever a new run in the
data is encountered) there is some HLT related code. Since the HLT configuration can change whenever a new run is taken, you need to get the HLT menu here, including finding
what triggers are present. Here we play some games with parsing the names to simplify our tree a bit. CMS trigger paths contain a version number suffix in the name which may be incremented to signify some "not really fundamental" changes in the trigger like changes in prescale values or level-1 seed. Over the course of the 2011 run some triggers reached
as high as "v13", meaning to go and query whether one of these fired you must either know ahead of time which "v" was active for which run, or do what we do here, loop over a whole range of 13 trigger names.
To simplify things for the exercise we empirically search for the right "v" trigger in the menu during
beginRun() then strip that off, making it a branch variable for the trigger
branches in the tree. I.e. you'll find a trigger like
HLT_Photon30_CaloIdVL_v8 in the
HLT_Photon30_CaloIdVL branch in the tree, with the branch var
version
set to 8. There are wildcard parsers out there but they must be used with care.
This brings us to
CMSDASTreeMaker::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup), which is the meat of the code. It is called for each
event of the file you're reading/processing. There is really not a whole lot going on here except a series of sections grabbing the
Handle of EDM collections we want
to fetch from the data file, looping over its contents, and doing something with them. In this case getting the values we want stored in the various class variables in each object &
copying them into our tree. Most analyzers have sections very much like this. You'll find similar code inside what makes
PAT tuples.
Ntupling
To make the ntuples used here you run this thing. Included is a sample config, probably configured as last used, with parameters which set thresholds to keep photon objects and PfCandidates, as well as the list of triggers preserved in the tree and required to save an event. You can run this interactively over a small number of files (though there are limits
on this in the lpc farm), or to run over large datasets like the ones that make up our 2011 data you'd use
CRAB. Within the
crabconfigs subdirectory you'll find an example
crab config that was used to produce the ntuples. In this case they are brought back to the
CRAB working directory -- you might also want to ship them to mass storage like EOS instead. There are some commented out lines in the crab config that would drop the results into /pnfs resilient space.
As with all good cooking shows we've already done this for you. But now we hope you know how we did it.
Using the Ntuples
The ntuple/trees produced from the data are living on the lpc:
/uscms_data/d1/dmason/photons/CMSDAS/CMSSW_4_2_8/src/DMason/CMSDASTreeMaker/crabconfigs/Run2011A-03Oct2011-v1B.root
/uscms_data/d1/dmason/photons/CMSDAS/CMSSW_4_2_8/src/DMason/CMSDASTreeMaker/crabconfigs/Run2011A-05Aug2011-v1B.root
/uscms_data/d1/dmason/photons/CMSDAS/CMSSW_4_2_8/src/DMason/CMSDASTreeMaker/crabconfigs/Run2011A-05Jul2011ReReco-ECAL-v1B.root
/uscms_data/d1/dmason/photons/CMSDAS/CMSSW_4_2_8/src/DMason/CMSDASTreeMaker/crabconfigs/Run2011B-PromptReco-v1B.root
|
Together they include all the 2011 data, and consume about 17 gigs of space. You can pull them over to your laptop (though note this can take half our
alloted two hours), or run on them on the lpc. Whereever you use them you will now want to get into an environment where you can run root. On the
lpc this is most easily done by wandering into your favorite recent CMSSW release area and doing a
eval `scramv1 runtime -sh` (or -csh depending
on your shell). You don't need to do anything with CMSSW -- this is just an expedient way to get your hands on root on the lpc machines.
We're going to let root make a skeleton analyzer from which to start analyzing this data. It is not the prettiest way to do this -- to see a more complete
way to do this take a look at the
SUSYNtuplizer referenced above -- remember we're aiming for easiest to get your hands on right away here.
So -- once you have your ntuples somewhere you want them to be, and you have access to root, fire it up & load in the ntuples -- you can do this by
just typing in commands or a small macro:
{
TChain *cmsdasTree = new TChain("tuple/Test");
cmsdasTree->Add("Run2011A-05Jul2011ReReco-ECAL-v1B.root");
cmsdasTree->Add("Run2011A-05Aug2011-v1B.root");
cmsdasTree->Add("Run2011A-03Oct2011-v1B.root");
cmsdasTree->Add("Run2011B-PromptReco-v1B.root");
std::cout << "loaded "<< cmsdasTree->GetEntries() << " events into cmsdasTree " << endl;
}
|
Execute the above commands or the macro you've put them into in root -- it should tell you you have a considerable number of events loaded into cmsdasTree.
Dutifully following your root User's guide, then do a cmsdasTree->MakeClass("WhateverYouWantToCallYourNtupleAnalyzerCode");
You'll have a skeleton code you can then work from to do these exercises. You'll have made two files, WhateverYouWantToCallYourNtupleAnalyzerCode.h and .C. Within
the .C file there is a Loop() method that is a good place to stick the meat of any code you write for the next steps.
To actually execute this you might want to construct another little macro:
{
.L WhateverYouWantToCallYourNtupleAnalyzerCode.C++O;
WhateverYouWantToCallYourNtupleAnalyzerCode k;
k.Loop();
}
|
The ++O above is excruciatingly important. You can just do a .L for this guy and something may run, but doing that invokes
ROOT's C++ interpreter, which does strange, insanely permissive things. A common rookie mistake is to forget the ++O for
a while, mess with your code, eventually see incredibly weird behavior (1+1=3 kinds of things), remember again the ++O,
compile, see the flood of errors the interpreter missed, and gape amazed at how the typos you introduced ever possibly worked
in the first place. Don't forget the ++O. Better is to write a real class ala
SUSYNtuplizer, but thats beyond the scope of this
exercise...
Relative Trigger Efficiency & Where to Set and Offline Cut
This aim of this first sub-exercise is to find a good choice for a photon pT cut in your analysis for a particular trigger. The HLT bases its trigger
selections on the raw supercluster energy, which in the offline photon reconstruction has additional corrections applied. This and that different calibrations
will be applied at datataking time vs the hopefully better ones we have for reconstructed or re-reconstructed data results, in some level of smearing between
the pt threshold applied by the trigger and a pt cut you apply offline in your analysis. You can get a good estimate of where to set your offline cut by selecting
events triggered by a lower threshold trigger, and then from those look at how often your candidate trigger
Modify your MakeClass thing you've created to book and fill two histograms of leading photon pT. One of these you require HLT_Photon50_CaloIdVL has
passed, the other where you
in addition require HLT_Photon75_CaloIdVL to have passed. Define a TFile where you write these histograms out and book
the histograms ahead of the for loop which runs through all the events, then be sure to do a .Write() after that. Having produced the histograms, in a separate
macro load them in and divide the Photon75 guy by the Photon50 guy. You should see something like this:
(insert plot here)
This is called a "trigger turn-on curve". That it is not a sharp step function, but more rounded is an artifact of the smearing between the quantities used
by the HLT to make its cut vs the reconstructed photon quantities in your analysis. You can try to correct for this, though that can be complicated and
be a source of error. What is typically done is you find where your trigger is most efficient. Often above 99% to ensure you don't need to worry tremendously
about inefficiencies between the online HLT quantities and the offline analysis ones. An Erf() function is usually used to fit this and is provided for you in
the macro subdirectory. Fit the Erf() function to this ratio and find where the 75 GeV trigger is 99% efficient. Typically you would set an offline cut to be the next
nice happy round number above this.
A couple notes here -- first this is actually not just a photon exercise, but a general trigger task -- this kind of thing is done to find the turn on curve of pretty
much any kind of trigger. Also this is not a precise measurement of the trigger efficiency -- that is best done with a technique like the CMS official tag & probe.
There within an independent sample (or as independent as your statistics may allow) you choose a tag sample within which you measure your trigger efficiency
via a trigger selection like "probe" sample. Last year's exercise covered this technique, and you're encouraged to take a look at it!
Look at Effects of Isolations in Photon Triggers
Here we take two triggers, one with isolation applied and one without and look at the differences manifesting in the offline quantities governing photon ID -- there is some
overlap with this and the next photon short exercise. We'll hold off talking about the subtleties of isolations until then -- we're here interested in getting a feel for
what your triggers do to the data.
As you hopefully know at some level by now, and will know more clearly after the next two exercises, photons usually produce narrow showers in the EM calorimeter,
and the activity surrounding the crystals a prospective photon deposits energy in are used to determine whether it was actually a photon or not.
Typically to measure the background effects of misreconstructed QCD jets in a photon analyses one defines a "fake photon" sample, often inverting requirements
that energy deposition surrounding the photon be below some threshold. Without justifying the particular requirements here, we're going to define a "photon candidate"
sample as one having:
Photon_TrackIsoPtHolDR03[i]+Photon_EcalIsoDR03[i]+Photon_HcalIsoDR03[i] < 6 GeV
And a "fake" photon sample with this quantity inverted -- i.e. >6 GeV.
Book and fill 12 histograms -- one for each of the 3 individual components in the sum above, one set for each combination of requiring
HLT_Photon90_CaloIdVL_IsoL, the other requiring
HLT_Photon90_CaloIdVL, photon or fake photon. You should try weighting these histograms by 1/prescale, then compare the number of events in the
photon plots vs the number of events in the fake plots for either the isolated or non isolated trigger.
--
DavidMason - 04-Jan-2012