Review Twiki for B2G-20-003 (X --> aa --> bbbb)

Latest note: http://cms.cern.ch/iCMS/user/noteinfo?cmsnoteid=CMS%20AN-2019/085 (v2)

Color Key:

Authors agree, and information is incorporated in next version of Note.

Authors agree, but this action item has not been accomplished at this time.

Authors disagree or no change is needed for the Note.

Authors are confused or uncertain, clarrification is needed!

Questions from October 4th presentation in B2G DiBoson: (https://indico.cern.ch/event/852500/)

  • slide 3 - a—>bb branching ratio is assumed to be 100%. Let’s make sure this is clearly stated in the AN.
    • We have made this explicit in the note, in the section detailing the signal samples.
  • slide 6 (and backup 29 and 30) - trigger eff vs mass — we don’t currently have an MC correction for the slight turn-on. Eventually needs to be applied. [Petar]
  • slide 7 - we were asked how high in X mass will our requests go. [Alberto]
    • We should make sure that the request is high enough also for where we expect to have sensitivity for 2017 and 18 datasets as well.
  • slide 8 - does it make sense to have the eta of the jets be as high as 2.4 since we are applying the double btagger and the eta is beyond the tracker region? we should check the eff of the double btagger vs eta. Also for what eta range are the SFs applicable? Let’s check with the BTV POG. [Petar]
    • We have checked with BTV. They said that the SFs should apply up to 2.5, and that the b-tagging is "valid" to 2.5. We also plotted the eta of the jets in the signal region (see control plots in note), and it seems that the bb-tagging is acting as an implicit cut on the eta. For now we are not making a change, as adding an explicit cut will not lead to a different result, and the procedure as it stands now has the blessing of BTV.
  • slide 12 - For plot on the left, let’s also plot the ratios of A/B and C/D to show they are the same/flat. [Jennifer]
    • We will add this information to future iterations of this plot!
  • slide 13 - It was requested that we should better describe in the AN how the ttbar shape and normalization is allowed to float etc in the fit. [Jennifer]
    • We will provide a better description in the next version of the Note: the ttbar shape is controlled by the parameter alpha in the ttbar HT reweighting. It is the exponential dependence on the HT. It is varied by 20% in each direction and the resulting ttbar is propagated as a template to the entire process. The same is true for the ttbar normalization. The amount of ttbar subtracted from the ABCD regions has a small effect on the fit: this secondary effect on the QCD is correlated the ttbar contribution.
  • slide 17 :
    • clearly the background rapidly increases below 50 GeV. Is 25 GeV truly as low as we can go? We should evaluate quantitatively where the sensitivity “falls off the cliff”. [Petar]
      • Theoretically you could see decays fown to 2xB mass, however: Sal has commented that below 25 GeV we should not trust our MC modeling of hadronization for pairs of bs. Going lower than that would require some quantification of the uncertainty in our simulation of these jets, and obtaining such an uncertainty would likely be an analysis of its own, similar to the SMP one performed to get N2 and SDM spectrums of AK8 jets at CMS (that one for masses as low as 10 GeV).
    • why does ttbar have that shape? would expect that it peaks more. (Marc can you remind us?) [Petar]
      • The ttbar component shouldn't peak sharply in W or t since neither of those is a true bb decay. The case where the b from the top and a c from a W are found in the same AK8 jet is the best way for ttbar to give you a bb-component. Such jets however have masses anywhere from the W to the top, thus the "stretched" appearance of the ttbar and the lack of a real resonance.
    • we should rebin the avg jet mass plots [Petar]
      • Agreed, we will rebin for future iterations of the Note.
    • it was pointed out that bkg uncertainty here looks too small. What is the reason? EH - I think it’s because we have not yet added the stat uncertainty in region B bin by bin — should double check. [Alberto]
      • The background uncertainty was not taking into account all the diffrent sources. This will be fixed in the next iteration of the Note.
  • We should make a Q&A twiki to collect questions received/answered (eventually this will be moved to the official review twiki when we have a CADI line). [Petar]
    • You are currently on the Q&A twiki for this analysis.
  • Why did we chose to parameterize C/D vs sub leading jet mass. would we benefit from parameterizing vs avg jet mass in the long run? Should show this.
    • We explored the fitting space and ABCD axes somewhat stochasitcally and used what worked best in the MC. We can investigate other combinations now that the procedure is settled.
  • It was requested that we run over the QCD b-enriched samples (Eva will start to bring over those nanoaods). We can then check if there is an effect from the heavy flavor composition. Also we may have better stats with that sample for MC related studies. [Clemens]
    • Not all of these samples have been produced. We will do this when we have access to them.
  • It was requested that we explore other b-taggers such as deep AK8 and Kevin’s btagger. [Petar]
    • We will do this after resolving the other issues/questions raised in the review so far.
  • delta eta and dijet mass are correlated — this will be eventually dealt with by a 2D mass fit. [Alberto]
    • I'm not quite sure what is meant here. If there is a dependence we don't seem to be sensitive to it?
  • We should follow up with Scott about the theory ref. [John H]
    • We will push our theory colleauge to provide us with a reference. This may not be available until later in the review process.
  • Currently our limit assumes f/N=1 — in long run we will also want to show model independent limits, e.g. a temperature plot of xsec (z-axis) vs the two masses a la SUSY. [John H]
    • Absolutely, the eventual goal is a 3-D limit in a mass, X mass and f/N. As the xs scales with f/N with a simple relation (no re-generation needed) we will continue the review process and limit setting with f/N = 1.

Questions from Petar Maksimovic on v2 of note

Physics:
~~~~~~~~
I have two important requests for the next version of the note:

1) It is not clear what is in the final likelihood. Is the search
in the average jet mass only, and not in the mass of the resonance?
Or you will fit m_ave for a sliding window on X?

We are going to set limits in the X-a plane. As the resulting plots are quite complicated to interpret, we will continue to use the a-mass as a pedagogical tool, but this will be made clear in the next version of the note.

Also, at this point it is impossible to tell how the background
estimate works with the final likelihood -- which shapes are floating,
and which are obtained from elsewhere.

We will make this exclicit in the next version of the note. The fit to the C/D region is just a starting point to get templates which describe the relationship between A and B. The uncertainties in that fit provide additional morphing dimensions for the shape of the background in the signal region. Another way to put it is that the C/D region is used to define the prefit template for A/B but that combine is free to morph within the uncertainties to get A/B: Disagreement between A/B and C/D does not play any part in the minimization of the lilelihood.


2) The background estimation procedure is also not very clear.
You are doing ABCD using double-b discriminator vs \Delta\eta,
but then you correct the obtained pass-fail ratio C/D as a function
of the jet mass of the subleading jet. You do this in bins of
the average jet mass... which is (obviously) correlated with the
subleading jet mass... Isn't there an easier way to do this? At
least, could you please try explaining this a bit more clearly?

There is a mass dependence for all the bb-taggers. We have to account for it when deriving an A/B or C/D rate. These rates are measured after the BB tag is applied to one jet (thus why we measure them as rates of a particular jet mass, not as the average mass). Incidentally, the average jet mass can be used for this as well, but we prefer to not fit in the variable we will be using to set a limit in. Again, we will try to make all this clearer in the next version of the note.

L66-69 For the future, if you do want to go below 25 GeV:

1) would it be crazy to look for boosted \Upsilon(4S)? (It
certainly is in our data...) While this does not include
the shower, it may still be useful as a check.

Theoretically you could see decays fown to 2xB mass, however: Sal has commented that below 25 GeV we should not trust our MC modeling of hadronization for pairs of bs. Going lower than that would require some quantification of the uncertainty in our simulation of these jets, and obtaining such an uncertainty would likely be an analysis of its own, similar to the SMP one performed to get N2 and SDM spectrums of AK8 jets at CMS (that one for masses as low as 10 GeV).

2) g->bb may also be useful to check what happens at low masses,
and there's infrastructure to do all this...

We should brainstorm a bit about this at some point.

Yes, indeed the SF measurement of the bb taggers exploits this. If I understand correctly, this information is already being fully exploited.

Fig.2 This plot is quite nice. I presume this could be recycled in
other searches...?

Indeed, as long as they match our very loose pre-selection (2 AK 8 jets must be present).

Please make the following plots (and add to the note):
- 1D plot of \Delta\eta (data & MC)
- 2D plot of \eta_1 vs \eta_2 (one for data, one for MC)
- 2D plot of \Delta\eta vs double-b discriminator (MC only)
- Fig.9 (double-b tagger) for \Delta\eta < and > 1.5)

OK, we have added these.

L265 Why 2nd leading jet? (The HH analysis is using the leading jet,
so there must be a reason you prefer the subleading one.)

We tried both iterations and the results are very similar. We are happy to switch to the leading jet if that is the preference of the reviewers.

Eq.3 Can you give a rough, back-of-the-envelope explanation for why
C/D depends on the soft-drop mass?

The pass-to-fail ratio of a doulbe-b tagger is sensitive to the gluon fraction in the data, as well as to the PF multiplicity (which influences the mass).

If you remove the "worst offenders" in terms of double-b tagging,
e.g., require > -0.8 or something like that, then would that
"fix" your C/D parameterization and make it flat(er)?

We tried this. This seems to remove some events, but the final measurement basically doesn't change, I think it just reduces the stats in the failed region and makes the C/D uncertainty larger.

Fig.11 I'm not sure I understand these two plots. So this is QCD MC,
but with some ttbar subtracted? So in the left plot we didn't
subtract all ttbar? Is that the origin of the bump in R_p/f
at the top mass?

In data we will subtract the ttbar from MC, as well as the up/down nuissance shapes of the ttbar (Normalization up and down, and the shape change due to the uncertainty in the ttbar reweighting parameter). We do the same thing in the QCD MC, and the resulting changes to the fit are shown in this plot. The next version of the note will clarify this with a separate plot, instead of trying to include it here.

L306 So by saying that "top pt reweighting will be allowed to float",
are you actually saying that the pink line in Fig.11 will float
between the dashed lines? (This is related to the fact that
there is no clear fit description...)
Yes, the ttbar uncertainty's small effect will provide additional QCD templates (and of course, also templates for the ttbar itself). We'll detail this better in the next note.

Text/style:
~~~~~~~~~~~

Please make all figures bigger!

L52 I would get into the habit of writing Higgs rather than higgs.
This will help once you start writing the paper.

L56 higg-like --> Higgs-like

Questions from Jennifer Ngadiuba on v2 of note

- you say that you look at m_a < 105 GeV, however you fit and search a much wider jet mass spectrum that include other diboson signatures, in particular ZZ and HH to 4b. What is your sensitivity to those and to which extent there is an overlap? Would a ZZ or HH signal bias your background estimation?

There may be some overlap, but the analysis is optimized for a broad range of masses. ZZ and HH should do better since they can pick a fixed mass window. We will set limits on HH as a cross-check.

- trigger section 2.2: can you also provide the plots of the trigger efficiency vs mjj and mjet to validate your statements?

Yes.

- signal MC section 3.1: is there a reference for the parameters and cross sections of Figure 3? Can you explain a little what are those parameters and why it is chosen f/NmX = 1? I guess those parameters control the width? would you consider looking at wider resonances too? since you fit both the jet mass and dijet mass, the jet mass would help here as additional information when the resonance become very broad?

No... there is no reference yet. f/NmX is the number of new particles that can enter a loop for the production of X (i.e. if there were T's and B's, there would be a larger cross section for the production since there are options other than a top loop for producing the X). We have found so far that we are not sensitive to the width within the bounds of the theory that we are using, but if another signal can be found that naturally provides larger widths we can use it as well.

- what’s the reason for using the double-b tagger and not the more performant DeepDoubleB version? This also comes in a mass decorrelated version and the analysis could benefit from it. see: http://cds.cern.ch/record/2630438/files/DP2018_046.pdf

We are switching to this now!

- section 5.1 about the selection optimization: you say that the tau21 leads to moderate gains in S/sqrt(B). Is that because you tested it on the top of the double-b tagger? you mention also undesirable mass sculpting effects but have you looked into tau21_DDT? Could you also provide the results of this optimization? This would help understanding what was done and clarify how the final decision about the selection has been made. Furhermore, adding signal distributions to the various control plots would help.

DDT variables are decorellated at specific background acceptance values, and it's unlikely that the previously used N2 or Tau21 DDTs coincide with an ideal cut for us, especially given that most of our discrimination power comes from the double B tagger. We can include our optimization studies in an appendix and we can add the signal to the control plots.

- lines 231-232: I guess you mean here “free from signal contamination” rather than background contamination?

Yes. We will fix this typo.

- section 5.3: is the control region also free from ZZ and HH signals? I guess the jet mass cut has a lower cut at 25 GeV but not an upper cut?

Actually the jet mass has an effective cut at 12.5 (25 is just the lowest signal that we consider. There should be no contamination from ZZ or HH since they should still fall in the mass asymetry region defined for the SR. Anomolous ZH production might be visible, but as that's not SM (and the SM ZH is so small) we aren't considering it.

- systematics: the impact of some of the listed systematics is not quantified (ex, JES/JER shape). Furthermore, why the JMS would not affect the jet mass shape too? And how about jet mass resolution? which values are you using for these JMS/JMR systematics?

We are iterating with the JetMET contact to get this right. At the moment the Jet Mass uncertainties are just taken as flat uncertainties (just as placeholders). We can provide the nuissance pulls to see which effects are largest, however, the signal is the only shape that is affected by these (everything else is data-driven) so it's unlikely that the limit setting will constrain these very much.

Questions from Alberto Zucchetta on v2 of note

L95-96: not sure which alternative you plan to test, but probably you can gain quite a lot with b-tag triggers and/or AK8 jet + substructure triggers

  • We did not find a large gain from the substructure triggers or from the b-tagged triggers.

L264: could you please include a 2D scatter plot of the two variables to demonstrate that they are effectively uncorrelated?

  • We will include this in the next iteration of the note.

Section 6.3: isn't this procedure equivalent to the top pT? After all, the gen top pT is correlated to the HT, so why not applying the top reweighting directly?

  • Top pT factor is e^(a pT), since we have two tops the two-Top-pT is e^(a pT1) * e^(a^pT2) = e^(a pT1+pT2) = e^(a HT) so this is identical to the top pT reweighting.

Figure 11: where this 20% comes from? Is it an arbitrary value?

  • Yes smile ... it is meant to be large enough to cover whatever uncertainty is in the ttbar. Since this just produces the pre-fit templates that combine uses to set the limit (and combine will change the uncertainties if it thinks they are too big), this shouldn't be affecting the limit. We can change this to something more reasonable (use the post-fit uncertainty in the CR as the starting point of the SR?), but for now we prefer to give it room to morph.

Figure 12: looks like that the average jet mass broadens the signal distributions, and it's really a pity not having the peak in the jet(s) mass(es). I think the 2D fit is really motivated fi one couldn't find a variable in which the signal peaks.

  • I think you're thinking of the ttbar? The average mass is just as peaky as the individual jet masses.

Figure 12: it's not clear what is the difference between these two plots: besides the type of QCD sample, shouldn't the data distribution be the same?

  • These are MC closure tests of those two samples (the "data") is the MC in the SR.

L342: i agree with Jennifer that the jet mass scale (and jet mass resolution, to be added) should be a shape uncertainty.

  • We can't add this the way we currently process data, but we can add it in the next iteration.

Figure 12 and 14: looks like that the background estimation fluctuations are much larger than the associated uncertainties. I asked this question during the meeting, and it seems that some uncertainties are not propagated: can you confirm?

  • Yes, these need to be fixed, they currently don't include all the systematics (in some cases) and suffer from crashing fits (in other cases). The next version will have safeguards in place to avoid crashed fits, and will include all the systematics on the background.
Section 8: which variable are you fitting to derive the limits? (the dijet mass i guess). My understanding is that you'll move to a 2D fit soon, so it would eb good to have a more detailed description of the strategy (which variables will be fitted, how to deal with 2D uncertainties, etc.) in the next AN version.
  • The limits in the current version of the note (v2) are set on the a-mass, where the signal peaks sharply and in the falling part of the background distribution. We are indeed moving to a 2D implementation, and this will be detailed in the next version of the note.
-- MarcOsherson - 2019-10-08
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2020-01-29 - MarcOsherson
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback