Background estimation from data

We try to estimate how much background is affecting the events selected by constraining the invariant mass of the objects selected.

Summary of the method

Below we explore how to use a $\chi^{2}$ based method to subtract background from the data.

The baseline idea is to build a robust $\chi^{2}$ that by a change variables (e.g. flipping the momentum of the leading jet) has a similar distribution as before for background events, but not for signal events. Having constructed this $\chi^{2}$ one can do an event selection by cutting on events with a $\chi^{2}$ value above a given threshold. The selection is done in two different ways:

  1. normal selection:
  2. flipped selection (the $\chi^{2}$ is computed using the flipped variables).
The $\chi^{2}$ is well constructed if:
  • selection 2 yields more or less the same background events
  • selection 2 yields much less signal events but leaves the distributions of interest (kinematics, b-tagging, etc.) invariant
Say then one is interested in a distribution h(x) of the variable - x. By the procedure described above one obtains two distinct distributions $h_{1}(x)$ and $h_{2}(x)$ depending on the event selection used. By taking the difference of these distributions the background contributions will be eliminated effectively if the first requirement above is met. If the second requirement is also met, then: $h_{1}(x)-h_{2}(x)$ is equivalent to $h(x)$ obtained from a 100% pure sample. Note that the final distribution is not biased only if the second requirement is fulfilled (distributions of interest in signal distributions must be left invariant).

For our purposes, measure $V_{tb}$, we are interested in subtracting the background contributions to the number of b-tags measured in one event. As so, in the construction of our $\chi^{2}$ for the di-leptonic channel we require that for events selected by 1) or 2) the evolution of the number of b-tags measured in an event, as function of the discriminant threshold is:

  • independent of the selection used for background events
  • lowers equally the statistics for events with 0, 1, 2 or $\geq 3$ b-tags in signal events
This requirements translate in the following two constraints:
  • $\frac{ N_{evts}(k b-tags |_{\chi^{2} \leq chi^{2}_{cut}}) } { N_{evts}(k b-tags |_{\chi^{2}_{flip} \leq chi^{2}_{cut}}) } = 1$ for background events
  • $\frac{ N_{evts}(k b-tags |_{\chi^{2} \leq chi^{2}_{cut}}) } { N_{evts}(k b-tags |_{\chi^{2}_{flip} \leq chi^{2}_{cut}}) } <code>...</code> \frac{ N_{evts}(j b-tags |_{\chi^{2} \leq chi^{2}_{cut}}) } { N_{evts}(j b-tags |_{\chi^{2}_{flip} \leq chi^{2}_{cut}}) } < 1$ for pure signal events

Next we discuss the construction of the $\chi^{2}$ having in mind the 2 requirements above.

Jet + lepton invariant mass

Our point of departure is the distribution for the invariant mass of the jet and the lepton - $s_{l+j}^{1/2}$ associated to a top decay without taking into account the missing energy from the neutrino. At Monte-Carlo level the distribution for $s_{l+j}^{1/2}$ is characterized by an average value of 95.2 $\pm$ 0.2 GeV and a width of 32.3 $\pm$ 0.1 GeV. For each event selected (passing the kinematical cuts defined here) we proceed as follows:
  1. select the 2 highest $p_{T}$ leptons as the leptons from W decay generated after the top decay;
  2. if the number of selected jets is higher than 2 than we select 3 jets using a likelihood ratio method based on the jet's $p_{T}$, $min(\Delta\eta)$ and $max(\Delta\eta)$ measured with respect to the 2 leptons as it was defined here;
  3. for each 3x2=6 jet+lepton combination we compute the following $\chi^{2}$: $\chi^{2}(jet,lepton) = \frac{ [ 95.2 - s^{1/2}(jet,lepton) ] ^2 } { 32.3^{2} + \epsilon^{2} } $ where $\epsilon$ is estimated by propagating the energy measurement error of the lepton and jet to the value of the invariant mass;
  4. from the 3x2 combination matrix obtained for the possible values of $\chi^{2}$ in the event we start by choosing the lowest entry to assign a jet to a lepton. The other jet+lepton candidate is found from the second lowest entry that does not correspond to the lepton or jet already chosen before;
  5. the total $\chi^{2}$ is built from the sum of the $\chi^{2}(jet,lepton)$ chosen;
  6. we repeate the procedure for computing the total $\chi^{2}$ but flipping the 3-momentum of the leading jet and we call this the $\chi^{2}(flip)$;
Using this procedure we obtain the following results:

1) After the event selection and choice of objects we might already have made a type I error (rejection of signal objects). In order to estimate the probability of making such a mistake we check, after the full event and object selection was done, how may objects can we match to the ones generated after the top decay (leptons and jets).The table below summarizes the probability of making this kind of errors in our analysis:

Probability for missing signal objects from the top decay estimated from Madgraph samples
Object Step Total probability
Event selection Choice of objects in selected events
1 missed 2 missed 1 missed 2 missed
Lepton 0.0176 $\pm$ 0.0006 0.0014 $\pm$ 0.0006     0.0190 $\pm$ 0.0008
Jet 0.267 $\pm$ 0.009 0.025 $\pm$ 0.002 0.040 $\pm$ 0.003 0.0008 $\pm$ 0.0004 0.33 $\pm$ 0.01

This table shows us that around 33 % of the signal events selected will be, in reality, background events, due to the fact that signal objects will be discarded by our event selection. Moreover it allows us to conclude that errors are mainly done in jet selection because the jet purity after full selection is 0.67 $\pm$ 0.01 (much lower than the lepton purity 0.98 $\pm$ 0.02).

2) The result obtained before compels us to divide the events in two categories:

  • pure signal events : $t\bar{t}$ di-leptonic events in which the 2 leptons and the 2 jets from the top decay are correctly selected
  • background events : $t\bar{t}$ di-leptonic events in which at least one of the 2 leptons and the 2 jets from the top decay were discarded by event selection + non $t\bar{t}$ di-leptonic events.
The category of an event is found accessing the MC truth for the reconstructed objects (leptons and jets) and for the hard-process generated. The invariant mass of combinations for jets+leptons in different event types is plotted below:

minv_jl_combinatorial.png
Invariant mass for different lepton+jet combinations in analyzed data samples

3) Below we obtain the $\chi^{2}$ and $\chi^{2}(flip)$ distributions for signal and background events. We also show the $\chi^{2}$ distributions for each (jet,lepton) pair (2 entries per event). Flipping the momentum of the leading jet leaves the background distribution almost invariant but clearly dumps the signal distribution. However this effect is not optimized, mainly due to the fact that we are not accounting for the missing $E_{T}$ and, as so the invariant mass distribution is very wide.

$\chi^{2}$ distributions obtained in signal and background events
Number of entries per event $\chi^{2}$ distributions in pure signal events $\chi^{2}$ distributions in background events (including wrongly selected $t\bar{t}\righarrow e\mu $ events ratio of $\chi^{2}$ distributions with respect to the background
1 ($\sum\chi^{2}(j_{i},l_{i})$)
2 ($\chi^{2}(j_{i},l_{i})$ , i=1,2)

4) We finally turn to event selection using $\chi^{2}$ cuts. To improve the efficiency of the method we select events in which the individual contributions of each $\chi^{2}(lepton,jet)< \chi^{2}(cut)$. From the previous results we chose: $\chi^{2}(cut)=1$. To check if the requirements for the $\chi^{2}$ are met we plot the distributions for: $\frac{ N_{evts}(k b-tags |_{\chi^{2} \leq chi^{2}_{cut}}) } { N_{evts}(k b-tags |_{\chi^{2}_{flip} \leq chi^{2}_{cut}}) }$ in signal and background events. The result is the following:

From this distributions we see that the requirements for the good $\chi^{2}$ are met for higher values of the b-tag discriminant. As so, we choose $\Delta_{b-tag}=5.3$ (also known as medium cut) to plot the b-tag distributions for normal and flipped event selections. Below we also plot the resulting difference distributions.

Estimating b-tagging efficiency

Having found the distribution for the multiliplicity of b-tags in an event one can estimate the b-tagging effiency (assuming R=1 - the top decaying always to a b quark). The number of expected events with k b-tags is given by:
  • $\bar{N}_{0}=(1-\epsilon_{b})^{2}$
  • $\bar{N}_{1}=2\times(1-\epsilon_{b})\times\epsilon_{b}$
  • $\bar{N}_{2}=\epsilon_{b}^{2}$
For each value of the b-tagging discriminator - $\Delta$ the corresponding efficiency can be estimated by maximizing the following likelihood:

$ L =\prod_{i=0}^{2} Poisson(\bar{N}_{i},N_{i}) $

The uncertainty associated with $\epsilon_{b}$ can be estimated by fitting a parabola ($a+bx+cx^{2}$) to $-logL$ in the neighbourhood of it's minimum. The b-tagging efficiency measured for a given value of $\Delta$ is then given by: $\epsilon_{b}=\epsilon_{b}^{o}\pm 1/\sqrt{c}$. The plots below show the results obtained using this method.

likelihood distributions for different $Delta_{trk counting}$ cuts b-tagging efficiency curve

-- PedroSilva - 26 May 2008

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng btagefficiency_jlinvmass_aftersub.png r1 manage 9.2 K 2008-06-03 - 09:31 PedroSilva  
PNGpng btagefficiency_likelihood_jlinvmass_aftersub.png r1 manage 11.7 K 2008-06-03 - 09:31 PedroSilva  
PNGpng chi2_single_bckgEvents_3j.png r1 manage 12.5 K 2008-05-29 - 02:24 PedroSilva  
PNGpng chi2_single_ratio_3j.png r1 manage 18.2 K 2008-05-29 - 02:24 PedroSilva  
PNGpng chi2_single_signalEvents_3j.png r1 manage 11.3 K 2008-05-29 - 02:24 PedroSilva  
PNGpng chi2_sum_bckgEvents_3j.png r1 manage 14.2 K 2008-05-29 - 02:24 PedroSilva  
PNGpng chi2_sum_ratio_3j.png r1 manage 15.8 K 2008-05-29 - 02:25 PedroSilva  
PNGpng chi2_sum_signalEvents_3j.png r1 manage 13.4 K 2008-05-29 - 02:25 PedroSilva  
PNGpng minv_jl_combinatorial.png r6 r5 r4 r3 r2 manage 19.4 K 2008-05-28 - 12:31 PedroSilva  
PNGpng nBtags_bckg_evol_singleChi2_3j.png r1 manage 15.4 K 2008-05-29 - 02:33 PedroSilva  
PNGpng nBtags_bckg_subtract_singleChi2_3j.png r1 manage 15.6 K 2008-05-29 - 02:36 PedroSilva  
PNGpng nBtags_signal_evol_singleChi2_3j.png r1 manage 14.9 K 2008-05-29 - 02:33 PedroSilva  
PNGpng nBtags_signal_subtract_singleChi2_3j.png r1 manage 16.1 K 2008-05-29 - 02:36 PedroSilva  
Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r12 - 2008-06-26 - PedroSilva
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback