Main Web>TWikiUsers>RustemOspanov>PhysicsAnpttH (2020-03-25, FudongHe)

Introduction
How to set up PhysicsAnpttH
Work flow examples
- Examples of run tth locally
- Examples of how to run tth validation work with branch jobs
Plans for migrating old ttH code to PhysicsAnpttH
Presentation
Timeline
Fake Study
- LeptonTaggers Package
  - Instruction on developing on Athena
  - Instruction on developing locally
Prompt Lepton Tagger Development
Retrained RNN with track IP information.

Introduction

This page describes commands for running ttH analysis using new analysis package: PhysicsAnpttH
General tutorial for setting up PhysicsAnpttH package: PhysicsAnpTutorial
Git project for the code to read (D)xAOD and making analysis ntuples: PhysicsAnpProd
Git project for the code to submit jobs: AnpBatch
References for old ttH code
- PhysicsttH SVN
- Instructions for running old code: PhysicsLightttH
References to ATLAS code and experts
- TrkVertexBilloirTools/FastVertexFitter.h
- InDetPriVxFinderTool.h
- Fast vertex fitting with a local parametrization of tracks by P. Billoir
- Andrew Stephen Chisholm - expert on conversion vertex reconstruction
- CrossDistancesSeedFinder.cxx - tools that finds point of closest approach of two tracks
- Paper on material interactions
- Configuration for primary vertex reconstruction

How to set up PhysicsAnpttH

This package is still under development.
If "git clone" above fails outside CERN, please try getting CERN Kerberos ticket first:

kinit -f -r7d -A $USER@CERN.CH

Note and Presentations

ttH release 21 note

On release 20

Set up ttH analysis package

This section describes how to set up this code for the first time using release 20.7 of ATLAS analysis software
Please run these commands only once and then exit shell:

mkdir -p ~/testarea/AnpttH
cd ~/testarea/AnpttH
git clone https://:@gitlab.cern.ch:8443/ustc/PhysicsAnpttH.git
source PhysicsAnpttH /macros/setup/first_setup_rel20.sh
exit

Set up PhysicsAnpProd

ustc/PhysicsAnpProd is a standalone package that reads (D)xAOD and produces ROOT ntuples with flat or vector branches
This section describes how to set up this code for the first time using release 20.7 of ATLAS software
Please run these commands only once and then exit shell:

mkdir -p ~/testarea/AnpProd20
cd ~/testarea/AnpProd20
git clone https://:@gitlab.cern.ch:8443/ustc/PhysicsAnpProd.git
source PhysicsAnpProd /macros/setup/first_setup_rel20.sh
exit

Set up AnpBatch package for managing batch jobs

AnpBatch contains scripts that help with managing batch jobs at CERN and USTC
There are two main macros for managing jobs:
- subCERN.py prepares shell scripts for individual jobs and to submit them to LXBATCH
- procJob.py copies job's input files to local disk on worker node and copies output ROOT files
Commands to check out this package:

cd ~/testarea/
git clone https://:@gitlab.cern.ch:8443/ustc/AnpBatch.git

On release 21

Set up ttH analysis package

This section describes how to set up this code for the first time using release 21 of ATLAS analysis software
Please run these commands only once and then exit shell:

mkdir -p ~/testarea/AnpttH
cd ~/testarea/AnpttH
mkdir source build run
cd source/
git clone https://:@gitlab.cern.ch:8443/ustc/PhysicsAnpttH.git
source PhysicsAnpttH /macros/setup/first_setup_rel21.sh
exit

Set up PhysicsAnpProd

ustc/PhysicsAnpProd is a standalone package that reads (D)xAOD and produces ROOT ntuples with flat or vector branches
This section describes how to set up this code for the first time using release 21 of ATLAS software
Please run these commands only once and then exit shell:

mkdir -p ~/testarea/AnpProd21/
cd ~/testarea/AnpProd21/
mkdir source build run
cd source
git clone https://:@gitlab.cern.ch:8443/ustc/PhysicsAnpProd.git
source PhysicsAnpProd /macros/setup/first_setup_rel21.sh
exit

Work flow examples

Examples of run tth locally

Running locally will be helpful for test.

Step 0: setup environment.

cd ~/testarea/AnpBase20/
source setup_atlas_analysis_release.sh

Step 1: make mini-ntuple.

cd ${TestArea}/PhysicsAnpttH/
python macros/runttHMiniNtp.py ${path_of_input_ntuples} --do-flip-ntuple=2 --do-tau -o out_minintp.root -n 0

Step 2: event selection.
- --do-Zincl: will select inclusive Z events.
- --noTrigSF: without lepton trigger SF (for testing)

python macros/runttHPlot.py ${path_of_input_minintuples} --btag-wp=70 -o out.root -n 0

Step 3: make tables and stacked plots
- This step needs both data and MC output root files from step 2.
- --draw-region: Will show the name of control/signal region
- --ilumi: luminosity

python macros/plotCand.py ${path_of_MC_dir} --data-file=${path_of_data.root}/data.root --xsec-list=data/plot/xsec_list.txt --get-regions --counts-dir=Counts --config-path=data/plot/plot_stack_2l.txt -r -s --do-fixrange --ilumi=79888.3 --draw-region -o plots

Examples of how to run tth validation work with branch jobs

Before you follow the instruction below, you have to set up your local environment.

cd ~/testarea/AnpBase20/
source setup_atlas_analysis_release.sh

Step 1: Generate mini-ntuples of Data and MC

Prepare input ntuples list:

subCERN.py will help us with preparing and dividing input Data/MC ntuples for each jobs.
The list of input Data/MC ntuples is needed.
${tag} = data(or mc16)

python ../../macros/makeFileList.py /lustre/AtlUser/fuhe/NtpR21/HIGG8D1_v6/${tag} -o input_ntuples_R21_v06_${tag}.txt --dsid --match=.root

Prepare the script to submit batch jobs :

There is a shell script to prepare batch jobs with subCERN.py as an example.
The path of input and output files in this script should be changed to a local one.

vi macros/tth/runCERN_base20_tth.sh   # repleace the old path of ntuples with the new one.
source macros/tth/runUSTC_base20_tth.sh mini ${YOUR TAG}

Submit batch jobs:

cd ${TestArea}/PhysicsAnpttH/work/batch/tth/Mini/
source config-data-${YOUR TAG}/submit_all.sh
source config-mc16-${YOUR TAG}/submit_all.sh

Step 2: Make hist root files of Data and MC

Prepare input mini-ntuples list:

${tag} = data(or mc16)

python ../../macros/makeFileList.py /lustre/AtlUser/fuhe/Mini/HIGG8D1_v6/${tag} -o input_ntuples_R21_v06_${tag}.txt --dsid --match=.root

Prepare the script to submit batch jobs :

The path of input and output files in this script should be changed to a local one.

vi macros/tth/runCERN_base20_tth.sh   # repleace the old path of mini-ntuples with the new one.
source macros/tth/runUSTC_base20_tth.sh hist ${YOUR TAG}

Submit batch jobs:

cd ${TestArea}/PhysicsAnpttH/work/batch/tth/Hist/
source config-data-${YOUR TAG}/submit_all.sh
source config-mc16-${YOUR TAG}/submit_all.sh

Step 3: Make stacked plots and tables

Using the outputs from the previous step. Data files need to be hadded into one file. MC can stay unhadded (But MC hist files with the same DSID should be hadded into one).

cd ${TestArea}/PhysicsAnpttH/work/batch/tth/Hist/out/tth_hist_${date}_Anp_data_${version}
hadd data.root job_0*

Make plots and tables.

python macros/plotCand.py ${path_of_MC_dir}/* --data-file=${path_of_data.root}/data.root --xsec-list=data/plot/xsec_list.txt --get-regions --counts-dir=Counts --config-path=data/plot/plot_stack_2l.txt -r -s -o plots

Plans for migrating old ttH code to PhysicsAnpttH

Migrate code to make mini-ntuples
- Ask Rhys for commands to make mini-ntuples with the old code
Use new batch scripts to make new mini-ntuples
- AnpBatch /macros
- Example of using new batch script for RPC analysis
Migrate code to select control regions
Update code to make plots and tables
- Plot making is already migrated by Rhys
Update this TWiki with complete instructions

Presentation

PLV_validate -- MCP-18-April PLV_WP -- MCP-09-May

Timeline

New PhysicsAnpttH code working with release 20.7 ntuples - mid-January
New PhysicsAnpttH code working with release 21 ntuples - late January to early February
Validate PromptLeptonIso/Veto with release 21 data/MC - February
Calibrate new muon working points - February to March
Study old variables with detailed truth for prompt and non-prompt leptons - March
Study new variables with detailed truth - April to July (possibly longer, depend on outcome)
Start physics analysis project - June

Fake Study

This study is devoted to improving the performance of the prompt lepton tagger (PromptLeptonVeto in release 21). We will start with a truth study of B decay with the ttbar MC sample from MUON5 derivation.

Make ntuples

Make list of input DxAOD: e.g.

 ${DAOD_path}= /lustre/AtlUser/fuhe/DAOD/MUON5/mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_MUON5.e6337_e5984_s3126_r10201_r10210_p3584/

cd ~/testarea/AnpBatch
python macros/makeFileList.py ${DAOD_path} -o input_muon5_daod_new_v1.txt

Generate rel21 ntuples:

cd ~/testarea/AnpProd21/source
source setup_atlas_analysis_release.sh
cd PhysicsAnpProd
# modify runUSTC_prodntp.sh, to use the new input DxAOD
source runUSTC_prodntp.sh new_v1

Fill histograms with ntuples locally

For every reco-lepton that pass(without) tight isolation cut, if its truth parents or grandparents contain B meson will be tagged. Fill truth information of that B. For the moment, the ntuples avaliable are

 user.rroberts:user.rroberts.mc16_13TeV.410501.PowhegPythia8EvtGen_A14_ttbar.DAOD_MUON5.e5458_s3126_r9364_r9315_p3263.ntp_v1_out

rm-match: remove the truth lepton that matched reco-lepton from B children.

python macros/runFakeStudy.py ntp_v3_test.root -o out.root -n 0 --rm-match

Make plots

Compare the selected B meson which has a non-prompt lepton that passed 'FixedCutTight' isolation with original B meson.

--logz: to make 2d histogram in logz scale
--local: to make SV study plots

python macros/plotFakeStudy.py out.root -s -o plots --ignore

LeptonTaggers Package

A tool that based on Athena can be used to decorate the DxAOD with Prompt Lepton Tagger output and input variables.

Instruction on developing on Athena

First time setup

mkdir LeptonTaggers
cd LeptonTaggers
mkdir build source run
cd build
setupATLAS
acmSetup --sourcedir=../source AthDerivation,21.2.51.0  
# AthDerivation,21.2.46.0 will forward JetTagNonPrompt to LeptonTaggers,
# so that you don't have to check out JetTagNonPrompt

After that, the next time you log in, you only have to do the following:

cd LeptonTaggers /build
acmSetup

Check out package

cd source
acm sparse_clone_project athena fuhe/athena   #init-workdir
cd athena
git checkout Add_NonPromptLeptonVertexingAlg_21.2 # checkout my work branch
# You can follow the latest athena repo by:
### git fetch upstream
### git checkout -b master-my-topic upstream/${release} --no-track
acm add_pkg athena/PhysicsAnalysis/AnalysisCommon/LeptonTaggers
acm compile

MUON5 derivation has already contained the JetTagNonpromptLepton package with PromptLeptonVeto since AthDerivation,21.2.40.0. To run the derivation you can use the command below:

cd run
ln -s /eos/user/f/fuhe/data/r21/Aod_test/mc16_13TeV.410501.PowhegPythia8EvtGen_A14_ttbar_hdamp258p75_nonallhad.merge.AOD.e5458_s3126_r9364_r9315/AOD.11182705._002433.pool.root.1 .
Reco_tf.py --inputAODFile AOD.11182705._002433.pool.root.1 --outputDAODFile output.pool.root --reductionConf MUON5 --maxEvents 10

Instruction on developing locally

First time setup

mkdir -p LeptonTaggers/source
cd LeptonTaggers/source
git clone https://:@gitlab.cern.ch:8443/ustc/Physics/LeptonTaggers.git
git clone https://:@gitlab.cern.ch:8443/fuhe/DerivationFrameworkMuons.git
source LeptonTaggers/macros/setup/first_setup_rel21.sh

Next time setup

source setup_atlas_analysis_release.sh

Make MUON5 DxAOD

test 410470 MC AOD: /eos/user/f/fuhe/data/LeptonTagger/AOD/mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.merge.AOD.e6337_e5984_s3126_r9364_r9315

cd ../run
ln -s /eos/user/f/fuhe/data/LeptonTagger/AOD/mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.merge.AOD.e6337_e5984_s3126_r9364_r9315/AOD.14761208._002797.pool.root.1
Reco_tf.py --inputAODFile AOD.14761208._002797.pool.root.1 --outputDAODFile output.pool.root --reductionConf MUON5 --maxEvents 1000   &&gt;log & tail -f log

Prompt Lepton Tagger Development

This part will show how the new "PromptLeptonImprovedVeto" were trained. The training needs dedicated secondary vertices detail information which only available on the MUON5 derivation for the moment. But you can also just use the "PromptLeptonImprovedInput_*" variables saved in your DxAOD.

You need first git clone the packages and setup your release follow the instruction: Setup release 21.

Make ntuples from DxAOD

Here is a test DxAOD with 200 events: /eos/user/f/fuhe/data/LeptonTagger/DxAOD/MUON5_DAOD-2020-03-24_PLIV_test/DAOD_MUON5.output.pool.root

cd ~/testarea/AnpProd21/source
source setup_atlas_analysis_release.sh
cd ../run
athena $SourceArea/PhysicsAnpProd/share/PhysicsAnpProd_ReadxAODr21.py -c "inputDir='MUON5';doDetails=True;dumpSG=False;EvtMax=200"
# inputDir='MUON5': MUON5 is the directory that contains the DxAODs you want to run.
# Use "EvtMax=-1" will run all the events.

Make mini-ntuples from the ntuples

cd ~/testarea/AnpttH/source
source setup_atlas_analysis_release.sh
cd ../run
python $SourceArea/PhysicsAnpttH/macros/runFakeStudy.py ${your input ntuples} -o out_mini.root --make-mini -n 2000 --save-tracks
# -n 2000: will run 2000 events, use "-n 0" will run all the events
# --save-tracks: will save the vector of the tracks near to the lepton, they will be used in the RNN training/evaluation.

Decorate the mini-ntuple with RNN outputs

Evaluate the RNN based on your local RNN json files and decorate the mini-ntuples with RNN outs. You can skip this step if you do not have local RNN want to save. The instruction of training one local RNN and convert it to the json format can be found at: Retrained RNN.

cd ~/testarea/AnpttH/source
source setup_atlas_analysis_release.sh
cd ../run
# Add RNN for the Muons
python $SourceArea/PhysicsAnpttH/macros/runMVA.py ${your mini-ntuples from last step} --do-eval-rnn -o out_rnn_eval_muon.root -n 2000 --trees="tree_prepCandMVAMuon" --do-muon

# Add RNN for the Electrons
python $SourceArea/PhysicsAnpttH/macros/runMVA.py ${your mini-ntuples from last step} --do-eval-rnn -o out_rnn_eval_elec.root -n 2000 --trees="tree_prepCandMVAElec" --do-elec
# -n 2000: will run 2000 events, use "-n 0" will run all the events

Train the BDTs

# Train for electrons
python $SourceArea/PhysicsAnpttH/macros/runMVA.py ${your mini-ntuples from last step} --do-train -o out.root --mva-path="TMVADataLoader" --do-elec --training-var=$BDT -n 0 --do-norm-sig-to-bkg --hist-norm="NormPt"

# Train for the muons
python $SourceArea/PhysicsAnpttH/macros/runMVA.py ${your mini-ntuples from last step} --do-train -o out.root --mva-path="TMVADataLoader" --do-muon --training-var=$BDT -n 0 --do-norm-sig-to-bkg --hist-norm="NormPt"

# Becareful the tree name of mini-ntuple.
# --mva-path="TMVADataLoader": Path of output BDT files.
# --training-var=$BDT: your BDT name that defined its inputs in "python/PhysicsAnpttHMVATrain.py"
# --do-norm-sig-to-bkg: Will do pT bin normlization, normalize non-prompt(sig) to the prompt(bkg) shape.
# --hist-norm="NormPt": pT bin calculation will based on this histogram. Defined at "config/PrepMVATrain.xml"

Evaluate the BDTs and save the outputs

python $SourceArea/PhysicsAnpttH/macros/runMVA.py ${your mini-ntuples} --do-eval -o out_eval.root --mva-path="TMVADataLoader" --do-muon --save-rnn-vars.
# --mva-path="TMVADataLoader": path of BDT weight files from last step
# --save-rnn-vars: Will save the local RNN to your mini-ntuples if they were already existted in ${your mini-ntuples}

Make ROC plots

# Step 1: generate histogram
python $SourceArea/PhysicsAnpttH/macros/runMVA.py ${your mini-ntuples} --do-plot --do-muon -o out_test_plot.root -n 0 --is-testing
# --is-testing: Only plots the events that do not used in the training.

# Step 2: Make plots
python $SourceArea/PhysicsAnpttH/macros/plotMVAROC.py  out_test_plot.root -o plots --draw-atlas -s

Presentations

WH-info-meeting -- first results.
ustc_wh_run2_plans -- WH plans.
summary slides
- 'newv1' -- 50% 410470 MUON5 DAOD && Anp ntp && first results with PLV cut.
- 'newv2' -- Anp ntp && track D0,Z0 added.
IFF_Mar25_PLV
Track_Mar28_vtx

PromptLeptonVeto research and development

Study of variable importance with low statistics samples
- ROC curves and tables summary of low statistic BDT
- Source directory for the raw data -- sum_ptNorm_part
BDT results with full statistics samples
- ROC curves and tables summary of full statistic BDT
- Source directory for the raw data -- sum_ptNorm_fulltraining

Prompt Lepton Tagger Questions

Light flavor(LF) fake for leptons.
- Above ~10 GeV, muon LF fakes from pion/kaon decays are negligible.
- Above ~10 GeV, LF fakes from jets mis-identified as electrons are suppressed quite well by electron LH. Isolation variables are also effective and already included with PLV.
- In same sign WW VBS analysis, MC statistics are lacking.
- Above ~10 GeV, rely on LH to reject LF fakes for electrons. Below ~10 GeV, need to be studied.

Ideas and plans (2018-12)

To apply a lower pT threshold to include more tracks. Start with 500 MeV first.
Try a larger cone size ($\Delta R(track, lepton) = 0.4$ in the past).
To see how many charged B children have been reconstructed as reco-tracks.
- track study summary
Training secondary vertex reconstruction algorithm or related variables with b-jets vs. prompt lepton samples.
To collect deeper layer information of calorimeter.

Ideas and plans (2019-02)

To select the reconstructed secondary vertices.
- Suffer from pileup vertices, background primary vertices.
Understand the K0 cases: reconstruction or energy depository within the calorimeter.
- New variable calErel may be helpful to this case.
  - calErel: Muon CollectionCluster energy/ parameter energy loss.
- Compared the different layer's energy of the calorimeter: no significant difference.
To study and understand why the fake non-prompt leptons passed PLV < -0.5 but failed FixedCutTight.
- To use no-iso leptons for the training.

Ideas and plans (2019-03)

Background primary vertices suppression
- Mainly from underline events and b jet hadronization.
- Merged the 2-track vertex (chi2/ndof < 0.5) that close to each other (1 mm) to refit the tracks and get the merged vertex. USTC_Internal_meeting_15Mar
Removed the lepton itself from primary vertex reconstruction and refit the primary vertex.
Calculate the charge of the vertex: no significant difference.
StudyMisidWithPhysicsAnpttH

Recurrent Neural Networks

RNN-based Tau-ID -- rnn-tauid
- Used standard per tau observables and per track and per cluster observables -- Retrieve and calculate variables
- RNN evaluation -- TauJetRNNEvaluator.cxx
- RNN score -- TauJetRNN.cxx
TensorFlow -- https://www.tensorflow.org/learn

Open questions

Understand the truth matching of prompt/non-prompt for leptons and vertices
- Consider removing the lepton near to the b-jet (or with a non-prompt vertex nearby) from the prompt category.
Study the probability of the tracks that not used for vertex reconstruction.
Understand why the distance between the Combined Vertex and Refitted Primary Vertex for prompt lepton is greater than 1 sigma (on average).
make the EventDisplay for the b-quack decay.

Retrained RNN with track IP information.

In this part, you will run a RNN training on the mini-ntuples from Make_mini_ntuples step.
Basic docker image for ML -- https://gitlab.cern.ch/fuhe/basic-python-image
RNN main running code -- https://gitlab.cern.ch/ustc/Physics/PromptRNN

Run the codes

run docker/singularity

singularity run -e docker://gitlab-registry.cern.ch/fuhe/basic-python-image:latest

# 1. Convert root to the hdf5 file.
python3 PromptRNN /macros/createRNNDataFromNtuple.py ${path_mini_ntp} -o elec_rnn_fullrun2 --nclass=4 --ntrack=5 --tree-name=tree_prepCandMVAElec
# then it will give your a hdf5 file named by:  elec_rnn_ntrack5_nvar6_nclass4_fcharm15.h5

# 2. Train the RNN
python3 PromptRNN /macros/trainRNN.py elec_rnn_fullrun2_ntrack6_nvar6_nclass4_fcharm15.h5 -w --rate=0.20
# -w: will scale the different category leptons into a certain fraction on each pT bin.
# --rate: the dropout rate for the DropOut layer

# 3. make plots
python3 PromptRNN /macros/plotRNNScore.py $input -k $model -b -o plot-rnn --nclass=4 --do-all-rocs
# -k $model: specify the output directory of RNN training. (contains: weight, RNN structure, model, prediction, loss)

Lightweight Trained Neural Network (lwtnn)

Link to its git: https://github.com/lwtnn/lwtnn
Convert the output of keras into a certain JSON format, so that it can be used in C++ production environment with lwtnn.

# 1. Clone the project from github:
git clone https://github.com/lwtnn/lwtnn.git

# 2. Prepare the input variable json list:
python3 PromptRNN /macros/makeInputVarJSON.py train-data/tracks_ntrack5_nvar6_nclass3_fcharm15_nevt20m.h5 -o input_vars.json

# 3. Use lwtnn convert the Keras architecture and weights files:
lwtnn/converters/kerasfunc2json.py architecture.json weights.h5 variables.json

Speed up with GPU (optional)

For the tensorflow release > 1.15, CPU and GPU support are included in a single package. For releases 1.14 and older, CPU and GPU packages are separate. -- tensorflow-gpu
Make GPU available to the Docker: Using NVIDIA GPU within Docker Containers
HTCondor tutorial with GPU + Docker/Singularity: GPUs

Topic revision: r61 - 2020-03-25 - FudongHe

Main

Webs

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
Main All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback