The official code is in UserCode/HiggsAnalysis/HiggsTo2photons/h2gglobe
The tag suggested is baseline_workspace_08Dec2011_nw which should run the standard 4-categories version of the baseline analysis
run the fitter.py python fitter.py -i datafiles.dat -n NUMBER -j JOB (so eg : for i in `seq 0 99`; do SUBMITTER( python fitter.py -i datafiles.dat -n 100 -j $i ) ; done (Note, you may have to set the variable DISPLAY to empty to avoid crashes: export DISPLAY
run combiner.py: python combiner.py with the correct files in filestocombine.dat
Some notes :
The submission script to run on cern batch is : /afs/cern.ch/user/o/obondu/public/forMorgan/submission.sh
The output of python fitter.py -i datafiles.dat -n NUMBER -j JOB is stored on castor /castor/cern.ch/user/o/obondu/Higgs/7TeV/5fb/test/
The output of python combiner.py is /castor/cern.ch/user/o/obondu/Higgs/7TeV/5fb/test/CMS-HGG_4763pb.root
The official datacard can be found here /afs/cern.ch/user/c/chanon/public/forMauro/cms-hgg-datacard.txt
The output of combine cms-hgg-datacard.txt -M Asymptotic -m 120 --generateBinnedWorkaround -S 1 ("-S 1" means that systematic uncertainties are incuded.) gives:
Sanity checks on the model: OK
Computing limit starting from observation
Will compute both limit(s) using minimizer Minuit2 with strategy 0 and tolerance 0.1
Median for expected limits: 1.54476
Sigma for expected limits: 0.788156
Restricting r to positive values.
Make global fit of real data
NLL at global minimum of data: -84995.8 (r = 1.99846e-06)
Make global fit of asimov data
NLL at global minimum of asimov: -84994.9 (r = 3.56712e-05)
At r = 2.235139: q_mu = 6.79976 q_A = 7.43612 CLsb = 0.00456 CLb = 0.54748 CLs = 0.00833
At r = 1.117571: q_mu = 2.29889 q_A = 2.10419 CLsb = 0.06455 CLb = 0.47325 CLs = 0.13639
At r = 1.676355: q_mu = 4.41133 q_A = 4.46089 CLsb = 0.01785 CLb = 0.50469 CLs = 0.03537
At r = 1.396963: q_mu = 3.30703 q_A = 3.19116 CLsb = 0.03447 CLb = 0.48706 CLs = 0.07077
At r = 1.536659: q_mu = 3.84982 q_A = 3.80513 CLsb = 0.02487 CLb = 0.49543 CLs = 0.05021
At r = 1.606507: q_mu = 4.12774 q_A = 4.12767 CLsb = 0.02109 CLb = 0.49999 CLs = 0.04219
At r = 1.571583: q_mu = 3.98801 q_A = 3.96488 CLsb = 0.02291 CLb = 0.49768 CLs = 0.04604
At r = 1.554121: q_mu = 3.91849 q_A = 3.88447 CLsb = 0.02388 CLb = 0.49656 CLs = 0.04809
At r = 1.545390: q_mu = 3.88387 q_A = 3.84455 CLsb = 0.02437 CLb = 0.49600 CLs = 0.04914
At r = 1.541025: q_mu = 3.86652 q_A = 3.82456 CLsb = 0.02463 CLb = 0.49572 CLs = 0.04968
-- Asymptotic --
Observed Limit: r < 1.5410
Expected 2.5%: r < 0.8381
Expected 16.0%: r < 1.1148
Expected 50.0%: r < 1.5448
Expected 84.0%: r < 2.1457
Expected 97.5%: r < 2.8508
Done in 0.62 min (cpu), 0.62 min (real)
Limit results from J. Tao
Steps: 1) prepared the RooWorkspace for mass point Mass between 110 to 150 with CMS-HGG_4763pb.root -- " root -b -q InterpolateMass.C\(${Mass}\)" with Normalization.C for SM Higgs; 2) "combine cms-hgg-datacard.txt -M Asymptotic -m ${Mass} --generateBinnedWorkaround -S 1", "-t Ntoys" can works; 3) "python limit-plotter-complete.py Asymptotic sm -r -s" to get the results; option -r mean the ratio and -s means smoothing, you can choose none(abs limit)/one(ratio or smoothing)/both (ratio and smoothing) of these 2 options.
the HiggsAnalysis/HiggsTo2 photons compiles with the following packages:
Test Release based on: CMSSW_4_2_8_patch2
Base Release in: /afs/cern.ch/cms/slc5_amd64_gcc434/cms/cmssw-patch/CMSSW_4_2_8_patch2
Your Test release in: /afs/cern.ch/user/o/obondu/scratch0/Higgs/GlobeTest/CMSSW_4_2_8_patch2__baseline_workspace_08Dec2011_nw
--- Tag --- -------- Package --------
V06-22-00 CondFormats/DataRecord
V00-03-01 CondFormats/EgammaObjects
baseline_workspace_08Dec2011_nw HiggsAnalysis/HiggsTo2photons
regression_Dec3d RecoEgamma/EgammaTools
---------------------------------------
total packages: 4 (4 displayed)
The NN function has to be included in GeneralFunctions_cc.h and GeneralFunctions_h.h files (see files in /afs/cern.ch/user/o/obondu/public/forHgg)
The NN weights (latest version from Hugues) is to be included in weights/TMVAClassification_MLP.class.C (see files in /afs/cern.ch/user/o/obondu/public/forHgg)
The input NN variables have to be recognized by globe (note that they ARE in V11_04_05 version of the non-reduced globetuples located in /castor/cern.ch/user/c/cmshgg/processed/V11_04_05): several files in the branchdef/ folder have to be modified (see files in /afs/cern.ch/user/o/obondu/public/forHgg)
This way both the make in the h2gglobe folder should do it and the scramv1 build in the CMSSW folder
Nicolas' core modifications are available here : /afs/cern.ch/user/c/chanon/public/forOlivier
Add NN variables to the reduction step
/sps/cms/chanon/save/ARCHIVE_OldCMSSWreleasesLxplus/CMSSW_4_2_8_patch7/src/HiggsAnalysis/HiggsTo2photons/h2gglobe/ contains an old installation of globe (LP?), with fixed pt cuts, and no energy regression
Example: go to /sps/cms/chanon/save/ARCHIVE_OldCMSSWreleasesLxplus/CMSSW_4_2_8_patch7/src/HiggsAnalysis/HiggsTo2photons/h2gglobe/PhotonAnalysis_scripts/ReduceGJet/ and create the jobs with createH2GGlobeReducer.bash then go in the Jobs directory, you can launch them with bsub -q 1nh RunCastorH2GGlobeReducer_0,1,2...
* (Tao) The root files locate in /castor/cern.ch/user/j/jtao/Hgg2011/h2gglobe_V11_04_05_reduction_jtao/ with the NN variables (pho_r19, pho_maxoraw, pho_cep, pho_lambdaratio, pho_lambdadivcov, pho_etawidth, pho_brem, pho_smaj, pho_e2x2 and pho_e5x5):
1) Data: PhotonRun2011A_Clean and PhotonRun2011B_Clean, 50 jobs each, with root files in /castor/cern.ch/user/j/jtao/Hgg2011/h2gglobe_V11_04_05_reduction_jtao/Data/ and the event statistic information data_stat.log. All the 100 jobs (50jobsx2samples) were copied to the castor successfully with the log file data_scp.log. (There are other two data samples, PhotonRun2011A and PhotonRun2011B in /castor/cern.ch/user/c/cmshgg/processed/V11_04_05/Data/, what's the difference? What's the meaning of "Clean"?)
2) MC samples incluing bkg and Hgg signal samples: 20 jobs for each bkg sample (BoxPt10to25/BoxPt25to250/BoxPt250/DiPhotonJets/DYJetsToLL_M50/GJet_Pt-20_pp /GJet_Pt-20_pf/QCDPt30to40_pp/QCDPt30to40_pf/QCDPt30to40_ff/QCDPt40_pp/QCDPt40_pf/QCDPt40_ff) and 1 job for each signal samples, GluGluToHToGG/VBF_HToGG/TTH_HToGG/WH_ZH_HToGG with mass points (90 95 100 105 110 115 120 121 123 125 130 135 140 145 150 155 160)GeV. All the MC root files locate in /castor/cern.ch/user/j/jtao/Hgg2011/h2gglobe_V11_04_05_reduction_jtao/MC_Sig_Fall11_S6. mc_stat.log show the event statistic information. All the 328 jobs (20jobsx13BkgSamples+4jobs*17MassPoints) were copied to the castor with the log file mc_scp.log.
Create diphoton minitrees
Nicolas hacked the globe to produce diphoton minitrees after the preselection. Changes are in h2gglobe/PhotonAnalysis/src/statanalysis.cc to create a minitree, including many isolation variables and the NN output for the leading and trailing photon. The selection was set from l.phoSUPERTIGHT to l.phoNOCUTS.
To run the minitree, you have to go in :
/sps/cms/chanon/save/ARCHIVE_OldCMSSWreleasesLxplus/CMSSW_4_2_8_patch7/src/HiggsAnalysis/HiggsTo2photons/h2gglobe/PhotonAnalysis_scripts/ReduceGJet/
you create the jobs with createH2GGlobeLooper.bash
then go in Jobs_Looper directory, launch the jobs with bsub -q 1nh RunCastorH2GGlobeLooper_0,1,2...
Optimization code
Nicolas is working on it
Getting the brazilian flag plot
Run interpolation between mass points in h2gglobe/Macros/InterpolateMass.C
In recent versions of globe: h2gglobe/Macros/limit-plotter-complete.py.
Limit result with re-optimized CiC4+NN (by J. Tao)
* I compared the results with the following cases:
(1) Rerunning with the official RooWorkspace: CMS-HGG_4763pb.root (official)
(2) Running from the reduction steps (Data & MC processed samples V11_04_05 with additional NN variables), the reproduce the RooWorkspace and same CiC selections. (Re CiC)
(3) Analysis with re-optimized CiC4 and photon NN selections, input without shift (CiC+NN )
(4) Same as (3), but MC S4ratio rescaling, 1.007 (EB) and 1.009(EE) from Fan (CiC+NNSF, SF means shift)
* 20120202_GlobalAnaWithNN.pdf with a bug for the data analysis,, forgot to use the Jason files for the data analysis.
* 20120205GlobalAnaWithNNDebug.pdf with the bug fixd.
* The selections with reoptimized CiC4+NN CANNOT give a better even equivalent result. How to do the re-optimized CiC4+NN in the global framework?
*The macro that Lyon uses for training TMVAClassification.C: TMVAClassification.C
* From Tao: Phontons were selected with ET>20 and |eta|<2.5 excluding [1.4442,1.566] , PixelSeed veto and some lSO selections, hoe<0.05 && IsoEcal<4.2 && IsoHcal<2.2, in PhotonCandDis.cxx after the event analysis with my own analyzer. And then selections, TrkISO<2.0 && SigIetaIeta<0.01(EB)/0.03(EE) && MCtruth selections, were used in the training TMVAClassification.C
* MC Samples used for training and test by Tao: Based on my presentation on 19th May, the "weight files" from the MVA analysis were obtained with Fall10 samples after the Jinan FCPPL workshop. The prompt photon were selected from the following DiphotonBox & Born MC samples:
For the Spring11 samples, the GJet and QCD with DoubleEMenriched filter are available and further 2 photons requirement with pT>20GeV and |eta|<2.6. The statistic are not enough. These samples are used to validate the shape of the NN output, as shown in my presentation on 19th May (TaoSlides.ppt) or 20110525 NN Higgs. The corresponding Spring11 MC samples as Fall10 described above are still in process. Unfortunately always CRAB errors happened (60317).
With the 2010 (36pb-1) and 2011 (3.013fb-1, up to run 177515 with 07Oct. jason file) data samples, after the kinematic requirements ET>30GeV, |eta|< 2.5 excluding the transition region (1.4442, 1.566), we select the photons except the 2 most isolated photons with the combinediso method. With the total 3.139fb-1 data, we select 14149 photons, including 4474/5589 unconverted/converted in Barrel and 1851/2235 unconverted/converted in Endcap. If we ask the Sigma_IetaIeta cuts, there are 2953 photons left, 953/930 unconverted/converted in Barrel and 503/567 unconverted/converted in Endcap.
For the comparisons of the NN inputs with MC, we select the fake photons after the matching with MC truth, from the GJet (/GJet_Pt-20_doubleEMEnriched_TuneZ2_7TeV-pythia6/Summer11-PU_S4_START42_V11-v1/AODSIM) and QCD (/QCD_Pt-*_doubleEMEnriched_TuneZ2_7TeV-pythia6/Summer11-PU_S3_START42_V11-v2/AODSIM with *=30to40, 40) samples
Fake photons from MC were selected with the basic selections (ET, Eta, spike removal and pixel seed veto), the ISO selections and without matching any real photons from MC truth, which means the same bkg selections as we used in MVA analysis.
1) The photons were selected with the CiC4 selections but failed any of the CiC ISO selections (RelativeCombinedISO_SelectedVertex,RelativeCombinedISO_WorstVertex,RelativeTrackISO). The same selections were used for both data and MC, to check the signal contamination from such selections. For the EB case, the signal fraction is about ~11% , seeing from plot FakePhotonDistributionsDataMC_FailAnyCiCisoEB.pdf.
2) With the CiC4 selections except the ISO selections, the relations between the etaWidth and the isolation variables in EB were plotted here for the signal photons and fake photons respectively, EtaWidth_RelCombinedISO_EB_True.gif/ EtaWidth_RelCombinedISO_EB_Fake.gif for the RelativeCombinedISO_SelectedVertex case with 2D scattering points and the TProfile distribution in the same plot.
For the fake case, more relations between etaWidth and isolation variables are list as follows, EtaWidth_RelTrackISO_EB_Fake.gif for the relative track isolation in CiC4, EtaWidth_TrackISODR04RhoCorr_EB_Fake.gif for the absolute track isolation in DR=0.4 after rho correction, EtaWidth_EcalISODR04RhoCorr_EB_Fake.gif for the absolute ECAL isolation in DR=0.4 after rho correction, and EtaWidth_HcalISODR04RhoCorr_EB_Fake.gif for the absolute HCAL isolation in DR=0.4 after rho correction;
3) The results should/will be check with the global analysis.
cvs co -r defautTag -d higgsAnalysis UserCode/hbrun/higgsAnalysis
cd higgsAnalysis
set the environement and compile
source prepareEnv.(c)sh
source compile_dir.csh
run the code
./theEXE.exe
what is the result ?
list of passing event are here passingEvents.log
we have 22299 event (should have 28500)
what are the input files :
sample name = /GluGluToHToGG_M-120_7TeV-powheg-pythia6/Summer11-PU_S3_START42_V11-v2/AODSIM
IPN tree upples : /pnfs/in2p3.fr/data/cms/t2data/store/user/obondu/GluGlu_120_v2/GluGlu_120_*.root
list of the CIC variables of the first 10000 events
* NNworkingslides.pdf: Fan Jiawei and Hugues slides in 19 of may meeting
* TaoSlides.ppt: Tao slides in 19 of may meeting
* 20110525 NN Higgs: Tao slides in 26 of may meeting. For the Hgg part, there is a bug for the normalization of Born MC samples. I forgot to multiply the matching efficiency ( 0.35, madgraph samples). I corrected this in my presentation for the IHEP weekly meeting 20110527 NN Higgs. With K-factor, data and MC agree well. The mass distribution of Mgg changed with the NN cut, please have a look at these 2 pdf files, TaoDiPhotonDataMC_mH115_NN.pdf and TaoDiPhotonDataMC_mH120_NN.pdf.
* NNshapeTests.pdf: Fan Jiawei and Hugues slides with tests about NN output
* 20110601NNadditinalVar.ppt: Some plots from J. Tao and H. Xiao for the additional variable. Jiawei and Hugues, please check the additional variable λ-/σηη2, including the data/mc agreement and its performance in MVA analysis.
* 201000615 NN Higgs.pdf: given by Tao on 15th June. I have checked the definition of Lambda+-. It's true that we should use the Cee, Cpp and Cep for the calculation, not the sigma ones. I am trying with the corrected variables and redoing the MVA analysis.
* NNshapeTests.pdf: Some slides from Fan and Hugues in order to understand the NN output shape, some DATA/MC comparisons and some performance checks
Cross check with training the NN with diphotonborn
* signal dataset
/DiPhotonBorn_Pt10to25_TrackingParticles_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
/DiPhotonBorn_Pt25to250_TrackingParticles_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
/DiPhotonBorn_Pt250toinf_TrackingParticles_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
(dataset for comparison /GJet_Pt-20_doubleEMEnriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM )
*background dataset
/QCD_Pt-20to30_EMEnriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
/QCD_Pt-30to80_EMEnriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
/QCD_Pt-80to170_EMEnriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
*statistic = DiPhotonBorn reduced to have after cut the same stat that QCD EM Enriched after cuts
*I put some weights for training to take in account the differents X section of each sample
*what are the cuts = pho_et>20 && (abs(pho_eta)<=2.5)&&(!((abs(pho_eta)>1.4442)&&(abs(pho_eta)<1.566))) && pho_hoe<0.05 && pho_hasPixelSeed==0 && pho_IsoHollowTrkCone<2 && pho_IsoEcalRechit<4.2 && pho_IsoHcalRechit<2.2 && ((pho_isEB==1&&pho_sigmaIetaIeta<0.01)||(pho_isEE==1&&pho_sigmaIetaIeta<0.03))
*variables : cEP, phi_width/eta_width, lambdaRatio, eMax/eSCraw, eta_width, e2x2/e5x5, lamdba-/cEE, r19
cross check for the pt categories test
* signal dataset
/GJet_Pt-20_doubleEMEnriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
*background dataset
/QCD_Pt-20to30_EMEnriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
/QCD_Pt-30to80_EMEnriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
/QCD_Pt-80to170_EMEnriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/AODSIM
*statistic = same stat in the 2 samples
*I put some weights for training to take in account the differents X section of each sample
*what are the cuts = pho_et>20 && (abs(pho_eta)<=2.5)&&(!((abs(pho_eta)>1.4442)&&(abs(pho_eta)<1.566))) && pho_hoe<0.05 && pho_hasPixelSeed==0 && pho_IsoHollowTrkCone<2 && pho_IsoEcalRechit<4.2 && pho_IsoHcalRechit<2.2 && ((pho_isEB==1&&pho_sigmaIetaIeta<0.01)||(pho_isEE==1&&pho_sigmaIetaIeta<0.03))
*variables : cEP, phi_width/eta_width, lambdaRatio, eMax/eSCraw, eta_width, e2x2/e5x5, lamdba-/cEE, r19
*pt categories : 20to30, 30to40, 40to50, 50to06, pt>60
one training with not cat, only cut aplied for effi-rej plot
* 20110706_NN.ppt and the update based on the discussion 20110706_NN_update.ppt
* 20110817_NN_PT30.ppt with ET >30GeV. Bkg rejection is about 50%-60% with ET>30 GeV when keeping 90% signal efficiency
* 20110908HggIHEPIPNL.ppt: A simple presentation prepared in a short time.
* 20110918FakPhotonAndDataMCComparisonHgg.ppt: All photon ET>30GeV. With EGM-10-006 Loose selections, only 17 photons were selected except the 2 most isolated with combIso. With EGM-10-006 Loose selections except Sig_IeIe, 33 photons were selected except the 2 most isolated photons. The pileup reweighting and FastJet Rho correction were made for the Data/MC comparisons.
-- OlivierBondu - 20-May-2011