TMVA Training (TMVA_Training)
BDT CHALLENGE 2018
Produced on 31-08-2018
Signal:
/dcache/atlas/susy/mmorgens/Tau3Mu/ntuples/Tau3MuHF/20180725/hist-300560.MC16a.root
Background:
/dcache/atlas/susy/mmorgens/Tau3Mu/ntuples/Tau3MuHF/20180725/hist-data15_13TeV_periodE_main.root %BR%
Note, I removed Pt vertex due to the discrepancy between muons and id-tracks
Here the list with the variable names used by Marcus:
BDT configuration flags:
MinNodeSize=1:MaxDepth=5
tmva_sig_train_cut: "train_flag > 0.5 && pass_trigger"
tmva_sig_test_cut: "train_flag <= 0.5 && pass_trigger"
tmva_bkg_train_cut: "train_flag > 0.5 && pass_trigger"
tmva_bkg_test_cut: "train_flag <= 0.5 && pass_trigger"
- triplet_sa0xy_sig
- triplet_slxy_sig
- triplet_life_time_sig
- HT
- Tt_hard
- run1_isolation_02
- triplet_vertex_pval
- calo_met
My configuration (copy-pasted)
TMVA_Marcus_challenge_n1d5_loose:
description: "Configuration to test overlap with Marcus"
option: "MinNodeSize=1:MaxDepth=5"
tmva_sig_train_cut: "randomVal > 0.5 && (triggerPass & 1)"
tmva_sig_test_cut: "randomVal <= 0.5 && (triggerPass & 1)"
tmva_bkg_train_cut: "randomVal > 0.5 && (triggerPass & 1)"
tmva_bkg_test_cut: "randomVal <= 0.5 && (triggerPass & 1)"
variables:
- Sa0xy
- SLxy
- SlifeTime
- Ht_1jet
- TtH
- Isolation_ConeX_20
- Pv_vertex
- Mt_Calo
Caveat
I merged the full functionality into the Analysis package for rel.21.
So this is now obsolete!!!
In short
This is just a telegraphic recap of what has to be done with this package (not for new users).
- Produce "bdtTree" samples
- Add their names in
Tau3MuTMVA_Training.cpp
after line 120
- Prepare BDTConfig file with list of bdt configurations
- submit to Stoomboot:
bash submit_Tau3MuTMVA_Training.sh < bdt config file > < number of configs to run >
- Re-run
analysis
to produce "invM" samples to be used with HistFitter
Where to find it
This package is originally found on STOOMBOOT:
/project/atlas/users/mbedog/TMVA_Training
(gitLab?)
Contents
Contents are:
-
Tau3MuTMVA_Training.cpp
(main implementation)
-
submit_Tau3MuTMVA_Training.sh
(script to submit the job on Stoomboot)
-
executable_Tau3MuTMVA_Training.sh
(script which is then run on Stoomboot)
Purpose
This package is used for the Tau->3Mu analysis at Atlas. As for that we train a BDT to distinguish our signal from the backgrounds, we use this code to train the BDT methods.
Presently we are using BDTG, as slightly more stable thatn BDT(A) and particularly more regular in its response shape, which makes it easier to fit a smooth curve to the latter.
After generating some Loosely selected samples of particular input variables (input for the BDT), BDTs can be trained on these. The package described here is responsible for the training of these BDTs.
Description
It is simply based on the
TMVA training example, and allowes the user to train BDTs according to specific variable selections.
The variables for each BDT configuration are selected in a specific text file, which is then read in by the class BDTList (coming from the package Tau3MuMethods in
testDERIV).
It uses as inputs ntuples which can be made with help of the tool
Tau3Mu_BDTFiller
from the same
Tau3MuMethods package, these are referred to as "bdtTree" samples.
The ntuple is generally supposed to be written out after Loose selection, though it is now being tested to perform training events which pass full Tight selection (in Run2 we expect to have sufficient statistics to permit BDT training after applying Tight selection).
Running the code
The code has been updated to use an intuitive config file. The whole code is now in source/
The code needs root to be set up before running.
You can compile and run the script locally:
g++ `root-config --cflags --glibs` -lTMVA Tau3MuTMVA_Training.cpp -o run.x
and then run
run.x < bdt config file > < position of the config inside the file >
This will cause the outputs to be stored locally!
Note as well that the code runs only 1 configuration by default (that's because it is meant to run on Stoomboot after testing).
One can simply change this by putting a loop into main() at the end of
Tau3MuTMVA_Training.cpp
.
Better is to run everything on Stoomboot (it doesn't take more than a few minutes on the
short queue)
bash submit_Tau3MuTMVA_Training.sh < bdt config file > < number of configs to run >
(the number here can be a gross over-estimation, as extra jobs will simply be terminated immediately).
The reason why at this stage one needs to declare a max number of jobs is that the information of the # of configurations existing is only inside the bdt config file, while the
bash script does not even open this file.
Running in this configuration places the outputs inside the BDT_Settings folder on /data/atlas/users/mbedog (can be changed in
executable_Tau3MuTMVA_Training.sh
).
The input "bdtTree" samples of choice are to be set inside
Tau3MuTMVA_Training.cpp
just after line 120.
Note that the variable "label_sample" is used to make the outputs unique, such as not to mix BDT trainings with different configurations.
Evaluate the result
To see the classic BDT training plots one uses the TMVAGui in root, this can be freely done after running the training:
setupATLAS; lsetup root
TMVA::TMVAGui("/data/atlas/users/mbedog/BDT_Settings/TMVA_W_conf_08_afterTight.root")
(The file name is the one of your output
.root file).
Better plots for the input variables can be made using the default plotting of the
testDERIV code (mergeMultiSource).
After running BDT Training
The following step in the analysis is to use the hereby trained BDTs to finally select signal vs background and obtain an Upper Limit for the branching ration of tau->3mu.
In plain words, one re-runs the
analysis
(or
./RUNANALYSIS
) code of
testDERIV to produce "invM" samples to be processed through the
HistFitter code to reach an estimate of the best cut on BDT value and the corresponding expected Upper Limit on the branching fraction.
--
MatteoBedognetti - 2017-02-08