Main Web>TWikiUsers>OndrejKovanda>NewBDT (2022-10-12, OndrejKovanda)

Tests of the new 2020 BDT

Most recent version of the analysis ntuples were merged and the new BDT algorithm was applied to them. These can be compared to the 2015/16 analysis data with 2016 BDT.

BDT 2020

This BDT uses the following list of discriminating variables:

No pileup variable was used in this BDT.

Running

When the addClassBDT_2020_groupVersion.cpp macro is run, it prints out missing branch in the input ntuples:

Error in <TTree::SetBranchStatus>: unknown branch -> closeTrkDOCA_T0134217728_LooSiHi1Pt05_f2dc2
Error in <TTree::SetBranchAddress>: unknown branch -> closeTrkDOCA_T0134217728_LooSiHi1Pt05_f2dc2

This should be one of the variables entering the BDT, so it should be clarified whether this could affect the results.

SB data comparison

Number of entries in SB, Original ntuple: 2452402
Number of entries in SB, New ntuple: 2454693
Difference Orig - New: -2291

i.e. there are more events in the new ntuple. Analysis preselections are applied to both.

The following figures show the invariant mass distribution in the top three 18% signal efficiency bins.

Corresponding bin edges:

2016 BDT ... {0.2455,0.3312,0.4163,1}

2020 BDT ... {0.2774,0.3662,0.4418,1}

Mass distribution in BDT bin 1	Mass distribution in BDT bin 2	Mass distribution in BDT bin 3

Mass distribution in BDT bin 1

Mass distribution in BDT bin 2

Mass distribution in BDT bin 3

Data SB blinded fits in top three bins

First, the fitter was validated against shape parameters obtained from data SB fits in all four bins as presented in the 2015/2016 internal note. The model used in both cases is 1st order Chebychev + exponential. Chebychev slope is constrained to be linear with BDT, exponential constant is constrained to be constant amongst the BDT bins.

Fitter validation against 15/16 bkg shape parameters in data SB

Next, this fitter was used to perform the same flavour of the fit in the top three bins only, comparing the 2016 and 2020 BDTs:

2016 BDT	2016 BDT bin 1	2016 BDT bin 2	2016 BDT bin 3	fitLog_2016BDT.txt
2020 BDT	2020 BDT bin 1	2020 BDT bin 2	2020 BDT bin 3	fitLog_2020BDT.txt

Background yields comparison:

Combinatiorial
	2016 BDT	2020 BDT
Bin 1	1.0950e+03 +/- 7.29e+01	4.4930e+02 +/- 5.53e+01
Bin 2	1.6882e+02 +/- 3.04e+01	6.8667e+01 +/- 1.65e+01
Bin 3	2.2281e+01 +/- 9.91e+00	2.1687e+01 +/- 1.07e+01

SSSV
	2016 BDT	2020 BDT
Bin 1	2.1811e+02 +/- 4.79e+01	1.8530e+02 +/- 3.90e+01
Bin 2	1.3447e+02 +/- 2.32e+01	8.4999e+01 +/- 1.37e+01
Bin 3	3.4279e+01 +/- 8.52e+00	2.1437e+01 +/- 8.08e+00

Unblinding - re-applying preselection cuts on loose ntuples

The mass spectrum looks strange + there are some negative-mass entries.

Unblinded region - missing entries	negative entries

Unblinded region - missing entries

negative entries

Checking Bs MC

These are the mass distribution comparisons between the 2016 BDT applied to the old derivation and 2020 BDT applied to the new derivation.

bin1	bin2	bin3

bin1

bin2

bin3

Bin 0 lower edge

Lower edge of BDT bin 0 (72 % signal efficiency) was identified by ordering the MC events according to BDT and counting the weights (CombWeights branch) untill the ratio of (counted weights)/(total sum of weights) reached 0.28 (= 0.72 signal events passed that BDT cut). The result is:

Crossed 0.72 signal efficiency point at BDT value: 0.164033
Previous entry has BDT: 0.164031

An attempt was made to validate the other bin edges of the 2020 BDT earlier found by Aidan by the same approach:

What was found : 18 % eff ... 0.439777, 36 % eff ... 0.363047, 54 % ... 0.274418, 72 % ... 0.164033

What Aidan found: 18 % eff ... 0.4418, 36 % eff ... 0.3662, 54 % ... 0.2774

UPDATE 25.9.20 :

The disagreement observed in the bin edges was due to wrong calculation of the weights. The "CombWeights" branch contains only the QLC*DDW weights, and we need to multiply this further with PVWeight, Muon{1,2}_trigger_sf, Muon{1,2}_reco_eff_sf. Then we indeed get the same bin edges as Aidan found together with (hopefully this time) correct bin edge of the 0th bin.

18 % eff ... 0.441817, 36 % eff ... 0.366231, 54 % eff ... 0.277443, 72 % eff ... 0.167089

2016 BDT values of events in 2020 BDT bins and vice versa

Signal MC

Regarding the two MC derivations:

2016 derivation: 166218 events in total, 120682 in the top 4 BDT bins, out of those 113728 match an event from the 2020 derivation

2020 derivation: 166752 events in total, 118016 in the top 4 BDT bins, out of those 110886 match an event from the 2016 derivation

there are 156694 events shared between the two derivations

Taking into account only the events in common, the current bin edges still correspond more or less to 18% signal efficiencies:

Common events - efficiency
	2016 BDT	2020 BDT
bin 0	0.178217	0.180164
bin 1	0.182015	0.179978
bin 2	0.184372	0.179547
bin 3	0.19332	0.180383

Full Fit on 2016/2020 BDT

feature	2016 analysis fitter	new fitter
comb bkg (chebychev)	yes	yes
sssv bkg (exponential)	yes	yes
Bs (double gaussian)	yes	yes
Bd (double gaussian)	yes	yes
Peaking bkg (double gaussian)	yes	yes
Peaking bkg constraint	yes	yes
Smearing parameters	yes	no
Relative efficiency in bins	yes	no
BDT mean + constraint	no	yes

Validation of the new fitter

In terms of Bs/Bd yields:

	N Bs	N Bd
new fitter	80.83+/-21.0	-10.96 +/ 19.1
15/16 result	80 +/- 22	-12 +/ 20
Fitter validation BDT bin 0	Fitter validation BDT bin 1	Fitter validation BDT bin 2	Fitter validation BDT bin 3

The model fitted by the new fitter has the background parameters initialized from the inidividual fits and the signal normalization initialized with the SM expectations: 91 Bs and 10 Bd

New BDT

Mass projections:

Fit of the 2020 derivation, BDT bin 0	Fit of the 2020 derivation, BDT bin 1	Fit of the 2020 derivation, BDT bin 2	Fit of the 2020 derivation, BDT bin 3	fitLog_2020BDT_allBins.txt

massPlot BDT1ETA1 2020Deriv NOsignalMassShift.png

Fit of the 2020 derivation, BDT bin 0

massPlot BDT2ETA1 2020Deriv NOsignalMassShift.png

Fit of the 2020 derivation, BDT bin 1

massPlot BDT3ETA1 2020Deriv NOsignalMassShift.png

Fit of the 2020 derivation, BDT bin 2

massPlot BDT4ETA1 2020Deriv NOsignalMassShift.png

Fit of the 2020 derivation, BDT bin 3

fitLog_2020BDT_allBins.txt

Bs yield: 75 +/- 18

Bd yield: -9 +/- 16

Shape-wise comparison: new vs. old derivation with 2016 BDT

Apr 22: working on the reference channel fit, some x-checks are in place to see if the new derivation is OK. It was projected vs. the 15/16 derivation, v2 ntuples. Note that the 2020 v4 is not the latest greatest at this point, as it has been replaced by 2021 v2. The 2021 v2 vs 2020 v4 comparison will be done by Joe on the reference channel, here I'm adding the missing link comparing 2020 derivation to 15/16 derivation. It's the full 15/16 Bmumu data.

Shape-wise comparison of 2020 derivation to 15/16 one, under the 15/16 BDT

Shape-wise comparison of 2020 derivation to 15/16 one, under the 15/16 BDT

Running the ntupling on data17_Main data18_Main

Gradually moving to the latest ntuples (v2, 2021 derivation) in May 2022. These have not yet been produced for data17_Main and data18_Main. I got the how-to on running the ntupling from here:

https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/BPhysicsRareDecaysBsmumuNtupleMaker .

I first checked if it works on 2015 period D Bmumu data - reproduced the ntuples and cross-checked them against what's on eos. The distributions of several variables seem identical in the reference and in my reproduction. I also checked the BJpsiK channel - all seems to work there as well. I'm therefore ready to ntuple the data17_Main and data18_Main.

Ntupling for 2017_Main and 2018_Main was performed successfully after several repeats of failing jobs.

Preselection code was updated for the Bmumu process as well to account for the int-like trigger matching description in v2 ntuples.

Then I found out that some of the branch names were changed - those of the isolation variables that enter the BDT calculation. They were, however, not changed in the BDT applicator itself, so now what? Jesus Christ, why did they change them from one nonsensical description to another? Why not just leave the naming as it was?? Anyway, the change is described here: https://indico.cern.ch/event/968954/contributions/4078161/attachments/2129095/3585188/brare_ww_20201023-1.pdf , so essentially we need to change the iso in the BDT applicator to BEJ and the dca/zca to BEL... However, there are two occurences of these in the applicator - one for the TMVA reader of the xml weight files, the other for the input/output tree branch name. The first must not be changed, the latter has to.

After applying the above-mentioned changes to the preselection and the BDT macro, I've ran the BDT on the v2 ntuples, and cross-checked the output mass distribution against the 2020 v4 ntuples that were used for the BDT comparisons earlier (2015/16 data SB - the v2 ntuples have not been unblinded). These test the new tools on the new derivation vs. older tools on the older derivation.

SB mass distribution, 2015 & 2016 data, top 4 2020 BDT bins	difference thereof
Shape-wise comparison of 2020 derivation to 15/16 one, under the 15/16 BDT	difference

FR2 SB fits

The same model as used in 15/16 was fitted to the Full Run 2 sample sidebands. The BDT bins were used as determined in the previous studies on the 2020 v4 ntuples, i.e. BDTBins = {0.1671,0.2774,0.3662,0.4418,1}.

BDT bin 0	BDT bin 1	BDT bin 2	BDT bin 3
BDT bin 0	BDT bin 1	BDT bin 2	BDT bin 3

The fit result: result.root

For future convenience, let's list the parameter values in the bins:

	av BDT	nCOMB	nSSSV
bin 0	2.0474e-01	2.1862e+04 +/- 3.41e+02	2.0076e+03 +/- 2.29e+02
bin 1	3.1154e-01	1.9449e+03 +/- 1.29e+02	8.9262e+02 +/- 9.70e+01
bin 2	3.9776e-01	2.4779e+02 +/- 3.89e+01	4.3567e+02 +/- 3.53e+01
bin 3	4.7235e-01	4.8540e+01 +/- 1.66e+01	1.8126e+02 +/- 1.81e+01

expConst	-7.5872e-03 +/- 6.00e-04
slope_0_cmb	-2.0741e-01 +/- 1.47e-01
slope_1_bdt	1.5670e-02 +/- 7.30e-01

This was a bit rushing - when I look back at the evolution of the shape parameters with average BDT, I see that we may not want to introduce the linear trend into the fit for the COMB slope:

Evolution of bkg shape parameters in the FR2 data sidebands

Evolution of bkg shape parameters in the FR2 data sidebands

Preliminary MC mixture

We don't have all the weights and stuff, but need to mix approximately the MC16a, d, e according to collected effective luminosities. The effective luminosities per period are in the BLS trigger table: BphysTriggers_FullRun2_(1).xls

We have multiple triggers in 2017 and 18. Same approach as in run 1 was adopted: divide the events into mutually exclusive categories with Trigger_1 && !(Trigger_2 || Trigger_3 || ...), Trigger_2 && !(Trigger_3 || ...) with triggers ordered by descending prescale (high-prescale are most exclusive). With that, events in each category were weighted by ratio of eff. luminosity collected with respect to the least prescaled trigger. E. g. in 2018, we've got:

Category 1: HLT_2mu4_bBmumu_Lxy0_L1BPH-2M9-2MU4_BPH-0DR15-2MU4 && !(HLT_mu6_mu4_bBmumu_Lxy0_L1BPH-2M9-MU6MU4_BPH-0DR15-MU6MU4 || HLT_2mu6_bBmumu_Lxy0_L1BPH-2M9-2MU6_BPH-2DR15-2MU6)
Category 2: HLT_mu6_mu4_bBmumu_Lxy0_L1BPH-2M9-MU6MU4_BPH-0DR15-MU6MU4 && !(HLT_2mu6_bBmumu_Lxy0_L1BPH-2M9-2MU6_BPH-2DR15-2MU6)
Category 3: HLT_2mu6_bBmumu_Lxy0_L1BPH-2M9-2MU6_BPH-2DR15-2MU6
Category 1 will be weighted by 26.2/62.739, Category 2 will be weighted by 53.353/62.739

This leads to following behaviour on the preselected, FR2 sample of Bs MC, which is otherwise unweighted and no BDT cut is imposed:

Effective luminosity weighting on Bs MC, FR2

Effective luminosity weighting on Bs MC, FR2

As it happens, in my immense ignorance I didn't get this quite right. The above weights may reproduce the average reweighting, but what we need rather is a per-event weight reflecting the prescales at a given luminosity (~= pileup) value. These are, in a slightly convoluted way, already present in some form in the ntuples as "PV weights".

The PV weights are calculated by the PRW tool (Athena) during the ntupling process. The PRW tool uses two inputs:

1. pileup profiles generated in the MC (NTUP_PILEUP files on the grid) - sample specific
2. luminosity files generated by the lumicalc tool based on GRL and trigger - these include the pileup information and corresponding trigger prescale in each LB in data.

The MC events are then assigned a random run number, and based on their generated \mu, they are assigned a weight accounting to the prescale of a given trigger in that run, averaged over LBs with the same average \mu. On top of the prescale weight, a weight reproducing the actual av. \mu profile in data is applied - correct me if I'm wrong - by taking the generated profile histograms in MC and data, dividing them and using that to reweight the MC based on it's actual generated \mu. The documentation of the PRW tool is here: https://twiki.cern.ch/twiki/bin/view/AtlasProtected/ExtendedPileupReweighting.

In the actual use, there's one PRW tool for each trigger category. The categories, fortunately, correspond to what's above, i.e. we've got for N triggers:
Cat 1: Softest_trigger && !(||all other triggers)
Cat 2: Second_Softest_trigger && !(||(Third_Softest_trigger ... Stiffest_Trigger))
...
Cat N: Stiffest_trigger

The categories are most relevant for 2017 and 18, although have been used for 2015 as well (see later). The PRW tool for each category is then fed the lumifiles for each of the trigger that falls into it. The prescale weight is then handled for the OR of the triggers based on their actual prescale at a given \mu.

Contrary to the baseline use of the 2015's mu6mu4 trigger for Bmumu, a category of 2mu4 && mu6mu4 was introduced in the ntuples too, but is then discarded at the preselection (going from Loose to Nominal).

The PRW tool inputs used for the v2 ntuples are as follows:

PRW files found in: /eos/atlas/atlascerngroupdisk/phys-beauty/BsMuMuRun2/PRW/v_05, obtained according to https://gitlab.cern.ch/atlas-physics/beauty/rare/bmumu-run2/AnalysisTools/-/blob/master/Pileup_Files/mc16d_get_ntup_prw.sh. These are the standard NTUP_PILEUP files for each of our MC samples (process, campaign). Downloaded from the grid and renamed for convenience. I tried to re-download mc16e bsmumu and cross-checked against the one in the folder - they are identical.

LUMI files found in: /eos/atlas/atlascerngroupdisk/phys-beauty/BsMuMuRun2/Lumifiles/v_10, obtained with online lumicalc tool. It used recommended GRLs from https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/GoodRunListsForAnalysisRun2 which did not change since then - the website lists the same ones as used in the lumicalc tool. I've cross-checked this on the logfiles that are in the above folder. Aside from that, recommended setting of the lumicalc tool was used, including LAr veto (LARBadChannelsOflEventVeto -RUN2-UPD4-10).

In summary, all looks up to date even today.

Fits to FR2 bkg MC

With the weighting checked, I proceeded to make projections of the background shapes on the PV-weighted FR2 bkg MC, in the 4 preliminary BDT bins

BDT bin 0	BDT bin 1	BDT bin 2	BDT bin 3
BDT bin 0	BDT bin 1	BDT bin 2	BDT bin 3
	av BDT	nCOMB	nSSSV
bin 0	2.0504e-01	7.8427e+04 +/- 3.15e+02	1.7570e+03 +/- 1.51e+02
bin 1	3.1022e-01	8.2781e+03 +/- 1.09e+02	1.7570e+03 +/- 1.51e+02
bin 2	3.9497e-01	1.0160e+03 +/- 3.90e+01	8.0790e+02 +/- 3.62e+01
bin 3	4.7586e-01	1.3047e+02 +/- 1.38e+01	3.1443e+02 +/- 1.93e+01

expConst	-1.4041e-02 +/- 5.46e-04
slope_0_cmb	-2.6408e-01 +/- 4.18e-02
slope_1_bdt	8.6599e-02 +/- 1.94e-01
Evolution of bkg shape parameters in the FR2 MC fit

-- OndrejKovanda - 2020-08-31

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
png	BDT_16vs20_allBins.png	r1	manage	12.6 K	2020-10-07 - 11:35	OndrejKovanda
xls	BphysTriggers_FullRun2_(1).xls	r1	manage	188.0 K	2022-07-14 - 10:48	OndrejKovanda
png	data1516_2016v2_vs_2020v4_BDT2016.png	r1	manage	49.4 K	2022-04-11 - 09:51	OndrejKovanda
png	difference_2020_v4_vs_2021_v2_BDT20_SB.png	r1	manage	29.9 K	2022-05-16 - 11:57	OndrejKovanda
png	eff_lumi_weighted_FR2_Bs_MC.png	r1	manage	28.5 K	2022-09-21 - 19:45	OndrejKovanda
png	evolution_bdt_FR2_SB_first.png	r1	manage	29.1 K	2022-05-16 - 17:06	OndrejKovanda
png	evolution_bdt_FR2_bkg_PVWeights_only.png	r1	manage	29.2 K	2022-10-12 - 10:59	OndrejKovanda
txt	fitLog_2016BDT.txt	r1	manage	256.5 K	2020-09-03 - 10:42	OndrejKovanda	3Bin_blindedData
txt	fitLog_2020BDT.txt	r3 r2 r1	manage	238.3 K	2021-07-01 - 17:22	OndrejKovanda
txt	fitLog_2020BDT_allBins.txt	r1	manage	225.4 K	2021-07-01 - 17:16	OndrejKovanda
png	massPlot_BDT1ETA1_2016Deriv_fullFitValidation_SM-initialization.png	r2 r1	manage	30.6 K	2020-10-13 - 17:54	OndrejKovanda
png	massPlot_BDT1ETA1_2020Deriv_NOsignalMassShift.png	r1	manage	31.4 K	2021-07-01 - 17:11	OndrejKovanda
png	massPlot_BDT1ETA1_FR2_SB_first.png	r1	manage	26.7 K	2022-05-16 - 12:13	OndrejKovanda
png	massPlot_BDT1ETA1_FR2_bkg_PVWeights_only_2.png	r1	manage	34.0 K	2022-10-12 - 10:59	OndrejKovanda
png	massPlot_BDT1ETA1_NewBDT_lin_slope_const_expConst_3bin.png	r1	manage	32.5 K	2020-09-03 - 10:42	OndrejKovanda	3Bin_blindedData
png	massPlot_BDT1ETA1_OldBDT_lin_slope_const_expConst_3bin.png	r1	manage	32.4 K	2020-09-03 - 10:42	OndrejKovanda	3Bin_blindedData
png	massPlot_BDT2ETA1_2016Deriv_fullFitValidation_SM-initialization.png	r2 r1	manage	30.8 K	2020-10-13 - 17:54	OndrejKovanda
png	massPlot_BDT2ETA1_2020Deriv_NOsignalMassShift.png	r1	manage	30.4 K	2021-07-01 - 17:11	OndrejKovanda
png	massPlot_BDT2ETA1_FR2_SB_first.png	r1	manage	25.6 K	2022-05-16 - 12:13	OndrejKovanda
png	massPlot_BDT2ETA1_FR2_bkg_PVWeights_only_2.png	r1	manage	31.3 K	2022-10-12 - 10:59	OndrejKovanda
png	massPlot_BDT2ETA1_NewBDT_lin_slope_const_expConst_3bin.png	r1	manage	31.1 K	2020-09-03 - 10:42	OndrejKovanda	3Bin_blindedData
png	massPlot_BDT2ETA1_OldBDT_lin_slope_const_expConst_3bin.png	r1	manage	32.4 K	2020-09-03 - 10:42	OndrejKovanda	3Bin_blindedData
png	massPlot_BDT3ETA1_2016Deriv_fullFitValidation_SM-initialization.png	r2 r1	manage	29.3 K	2020-10-13 - 17:54	OndrejKovanda
png	massPlot_BDT3ETA1_2020Deriv_NOsignalMassShift.png	r1	manage	29.4 K	2021-07-01 - 17:11	OndrejKovanda
png	massPlot_BDT3ETA1_FR2_SB_first.png	r1	manage	24.9 K	2022-05-16 - 12:13	OndrejKovanda
png	massPlot_BDT3ETA1_FR2_bkg_PVWeights_only_2.png	r1	manage	31.8 K	2022-10-12 - 10:59	OndrejKovanda
png	massPlot_BDT3ETA1_NewBDT_lin_slope_const_expConst_3bin.png	r1	manage	28.9 K	2020-09-03 - 10:42	OndrejKovanda	3Bin_blindedData
png	massPlot_BDT3ETA1_OldBDT_lin_slope_const_expConst_3bin.png	r1	manage	29.3 K	2020-09-03 - 10:42	OndrejKovanda	3Bin_blindedData
png	massPlot_BDT4ETA1_2016Deriv_fullFitValidation_SM-initialization.png	r2 r1	manage	30.5 K	2020-10-13 - 17:54	OndrejKovanda
png	massPlot_BDT4ETA1_2020Deriv_NOsignalMassShift.png	r1	manage	29.3 K	2021-07-01 - 17:11	OndrejKovanda
png	massPlot_BDT4ETA1_FR2_SB_first.png	r1	manage	24.3 K	2022-05-16 - 12:13	OndrejKovanda
png	massPlot_BDT4ETA1_FR2_bkg_PVWeights_only_2.png	r1	manage	31.5 K	2022-10-12 - 10:59	OndrejKovanda
png	mass_distribution_2020_v4_vs_2021_v2_BDT20_SB.png	r1	manage	46.3 K	2022-05-12 - 15:16	OndrejKovanda
png	negativeEntries.png	r1	manage	17.4 K	2020-09-15 - 10:57	OndrejKovanda
root	result.root	r1	manage	564.2 K	2022-05-16 - 12:02	OndrejKovanda
png	validationSummary.png	r1	manage	97.5 K	2020-09-03 - 10:29	OndrejKovanda
png	variableTable.png	r1	manage	115.5 K	2020-09-03 - 10:24	OndrejKovanda
png	weirdBlinding.png	r1	manage	19.7 K	2020-09-15 - 10:51	OndrejKovanda

Topic revision: r22 - 2022-10-12 - OndrejKovanda

Main

Webs

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
Main All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback