Questions and answers for Higgs → 2 photons anomalous couplings to gluons with full Run2 dataset

General remarks

Key to color-coded answers:

  • Answer to a question, no follow-up seems needed
  • We agree, implemented in analysis/text.
  • We disagree, prefer not to implement.
  • We did not totally understand what was asked and request clarification before we can answer.
  • Under discussion by the authors, OR, in the case of FR, to be discussed during FR before modification. (Open item.)
  • We agree, changes/studies in progress. Responsible person is indicated. (Open item.)

Documentation and important talks

  • CADI line: xxxx

  • AN: xxx

  • Pre-approval: xxx

  • Approval:xxx

Fabrice's comments (v0)


More... Close

Received: 8th of April 2024

  1. Discriminating variables & MC (Section 3)
  • Fig. 3. ggH fa3 = 0 and ggH fa3 = 1 have exactly the same colour, Could you update this since these are your main samples (the ones you are trying to distinguished)?
Done. Now the plots include the following VBF samples: fa3 =1, fa3 = 0.5, SM JHUGen (fa3=0), SM amcatnlo.

  • Dint and DCPggH seem to be the same thing could you just defined it as DCP or Dint and stick to it this makes it confusing sometimes
We added the specific definition of D_CP for the sake of clarity. D_int is only used a few lines later, when we say that it (and D_CP, as a consequence) is between -1 and +1.

  • L. 224-225: you are saying “In addition, the dedicated CP-odd ggH classifier is found to have a significantly better discrimination power between different signal hypotheses than the analogous MELA discriminant DCP as explained in 7.3”. Well I did not really find the explanation in section 7.3 ? This MELA CP seems to be related to the interference (which seems to be confirmed by Fig. 3, the interference is indeed shifted with respect to fa3_ggH = 0 or 1). In your BDT, you do not use the simulation with the interference do you ? So in what sense does these discriminants compare ?
Indeed, we do not use the interference term in the BDT. This sentence was incorrect and it has been removed.

  • You are saying that JHU VBF SM is identical to the amc@nlo simulation could you add an appendix showing this ? i.e. in your appendice a series of plots with all the variables of interest (including the vertex probability) ?
The two SM VBF contributions (JHUGen and amcatnlo, not powheg, it was a typo) are now shown from fig 4 (including the vertex probability).

  1. Background description section 7.2
This section would deserve some improvements in the description
  • L.266 -268: you mention you have 2 ways of simulating gj and jj bkg events (Pythia or MadGraph +Pythia), which of these simulations are used in section 7.2 ?
Actually, only Pythia samples are used for gj and jj bkg events. Fixed in text, and add background list.

  • Do you do an ad hoc treatment of the double counting between the different samples: gg removal from the g-jet and jet-jet simulation? If you use madgraph for g-jet you also need to remove g-j event for the multi jet Pythia simulation ? Could you add these information
Actually, no Madgraph sample are used, only Pythia, see point above. To remove the potential double-counting of non-prompt photons in the control regions, we apply the photon prompt-fake filter, which enforces the truth-matching of exactly only one photon in simulation (i.e. if both leading photons or no photons are matched the event is rejected). This ensures that the contribution of non-prompt photons from the samples gamma-jet, jet-jet are not overlapping and can be summed together in each control region.

  • Do you re-scale the gg background simulated from Sherpa with any additional k-Factor ? Other analyses are typically applying a factor of 1.3 on top of the sherpa cross section based on a a fit to the phoID score.
No, we do not apply any k-factor correction, given that we obtain the correct overall normalization of the data-driven QCD background as shown in the validation plots.

  • L 454-455: you define P = BDT > -0.2 and F = BDT < -0.2. Therefore by construction there is no FF in your PF or FP regions. Then L. 490-492, you are saying that there are some FF in PF and FP regions which is impossible according to your P/F definition of L 454-455. Could you clarify ? May be change the notation when you refer to prompt-fake regions (based on BDT value) and prompt-fake origin of the object (based on the truth matching in the simulation).
Indeed, the description of the CR region (BDT, g1 <-0.2 && BDT,g2 <-0.2) was confusing, chaged the notation to PASS-FAIL (PF) ; FAIL-PASS (FP) ; FAIL-FAIL (FF) regions based on photon MVA ID. The CR is defined on the basis of the BDT cut, while the prompt-fake filter is defined at the level of truth matching is applied for all CRs. Improved description of prompt-fake filter in the text.

  • I honestly did not manage to follow the reasoning after these lines, I suppose due to the prompt-fake confusion. A few examples:
    • What does it mean exclusive in the Eq. L 491 ?
The contribution of FP region excluding the migration of di-jet events, in which one jet is mis-reconstructed as a photon from FF region so it enters either FP or PF region depending on whether it's leading or subleading. Improved text.
    • f(gammai) factors seem to be related to the “FF” origin but then in Eq. L 502 it is applied to FP ?
f(gamma) transfer factors are actually applied to all CRs in data, not only FP, also PF and FF, see equation L505 .
    • You are attributing a per-even weight therefore should we find that Sum_weights = N_datadriven which does not seem to be the case ? If this is not the case, what is done ?
These reweighiting factor are independent of each other, so the sum of the event weights does not need to be the total yield.
    • I am sorry to be so confused by this section…
  1. Category optimisation section 7.1

In the note you do not really describe how the category definitions are optimised (I think you mentioned brute-force but this is not so clear)

A section has been added to explain the binning optimization

  1. Multiclassifier BDT section 7.3
The use of the multiclassifier is questionable in your description for several reasons
  • You are saying that the node discriminating Cp-even and CP-odd is not good (L. 629) so then the obvious question for the reader is why do have a multiclassifier (3 nodes) rather than just a regular signal (ggH+2j) / background (everything else) classifier ? More on this later.
That statement is wrong and it has now been corrected. The ggH CP-even discriminant vs ggH CP-od discriminant has a similar performance to the D_0- disciminant in terms of AUC, yet slightly lower, more on this later. We investigated that the model performance based on the ROC curve is equivalent for a classical binary classifier (ggH vs inclusive background) and the three-class multiclassifier used in the analysis (ggH SM, ggH CP-odd and background). In fact, no additional confusion was introduced in the model despite having two separate signal classes instead of one merged signal class (ggH 0- and 0+). This can be evinced from the AUC value of the ROC curve that are equivalent. In addition,despite not applying a cut on the dedicated ggH CP-even classifier (no improvement in expected sensitivity), it is still preferable to have a separate ggH 0+ signal class in a multiclassifier than a binary classifier (ggH 0- vs bkg). Firstly, we want to provide the information about ggH CP-even topology to validate the multi-class BDT performance by comparing the ggH 0+ classifier discriminant against the MELA-based discriminant. Secondly, it is necessary to suppress the inclusive background and enhance the ggH CP-odd component without in parallel suppressing the ggH SM signal, which must then enter the training as a separate class. For instance, it has been has been shown in the three-classifier that using mjj as an input to the training set, on hand improves the separation between different signal hypotheses. On the other hand the dominant effect is the reduction of the ggH SM yield, which in turns reduces the sensitivity to fa3_ggH.

  • You do not have a VBF node while VBF is as you mention later is the dominant source of resonant bkg. General question: would you benefit from having a VBF related category which would help in fixing the VBF related profiled parameters in the fit and hence reduce this dependence ? This would be a nice addition to you rmulticlassifier (mjj var seems like a natural variable that you are not using for now) especially if one node is really useless (which I am not so sure).
1) Indeed, the possibility of using a four-classifier with a separate node for VBF was investigated. However, while performing the brute force optimization on the sensitivity, we found out that having a separate node for VBF would actually yield a little worsening of the constraints on fa3_ggH by 7-10% depending on the binning. One of the reasons is that the main improvement in sensitivity comes from applying a hard threshold on the background classifier, suppressing simultaneously both the non-Higgs and VBF background in the three-classifier case. 2) Initially, the variable mjj was also tested together with others in a larger set (21 variables), which was later trimmed to only 15 discriminant variables, following the removal of variables with higher cross-correlation and low discrimination power. The ranking based on the separation power of the individual variables among between different classes was computed, and the mjj variable was ranked in the middle (11th out of 21 variables). However, in the importance ranking of the BDT, it was found to be the least discriminating variable with p< 0.017, i.e. used less than 2% as a cut variable in the BDT, hence it was removed from the training set.

  • Fig: 19: could you explain what are the samples considered as “bkg” and “sig” for each of the ROC curves ? The caption of the Figure is confusing, like the AUC for “pseudo-scalar” is very good which mean you would have a very good discrimination but in that case this probably comparing CP-even signal to background which is not really what we want. That would be interesting to have more explicit ROCs with separated bkgs
    • VBF only vs ggh+2jets SM (use whatever node is the best for this and say which one it is)
    • CP-odd signal vs CP-even ggH signals (may be this is Fig 19 a)
    • Non-resonant vs ggH+2jets SM (use whatever node is the best for this and say which one it is) Bkg node related questions
We agree that is better show 1vs1 classifier ROC curves, rather than 1-vs-rest (sum of all other classifiers) as in the previous version of the AN. Given we do not have a separate VBF classifier, we have added the following ROC curves in the AN:
    • ggh CP-even vs ggH CP odd classifier for ggH CP-even sample
    • ggH CP-odd vs non-resonant+bkg for inclusive background sample

  • What is the overall normalisation between non resonant and resonant backgrounds ?
The QCD background (gamma-jet and jet-jet) is estimated from data, so it has by construction the correct normalization, i.e. corresponding to the luminosity of the data-taking year. The VBF and non-prompt di-photon backgrounds are estimated from MC, hence they are normalised to the total luminosity of the corresponding data-taking period. This is equivalent to a cross-section reweighting of the non-resonant and resonant background contribution. Added in text

  • Do you train your multi classifier with a cut on the diphotonMVA ? If not why don’t you apply the cut 0.75 before training that would help the net focusing on the difficult part and change the balance between resonant and non-resonant (given you dipMVA cut the importance of the g-j bkg is probably very very minor while dominant if you do not apply any cut)?
Although in principle it is true that applying a cut on diphoton-MVA should help restricting the probed region of the phase space in the BDT, there are two main reasons not to do that. Firstly, if we apply the diphoMVA cut >0.75, the resulting QCD background sample would be exceedingly statistically limited. There would be only 50k events for the training in case of a data-driven approach, an order of magnitude less if using the gj and jj simulated samples. Since the contribution to the SR of the QCD background is sub-dominant with respect to prompt gg, but not negligible, we prefer to apply a relaxed event pre-selection in the BDT training. The second reason is that the data-driven method chosen for the modelling of the QCD background is by construction not mapping the dipho_MVA score in the extrapolation of the yield from CR to SR, since the reweighing procedure is based only the leading (subleading) photon kinematics unlike in other QCD estimation methods eg. the one used in the mass measurement which use the photon MVA reweighting. Applying a cut on this variable would also not be correct for this method as the di-pho MVA is not well modelled as an input feature. Added plot in the AN. One would need to switch to another type of data-driven background estimation method with the diphoMVA reweighting from SR to CR to apply a threshold.

  • Again why no VBF node rather than mixing it with the “easier” backgrounds ? CP-odd vs CP-even nodes
See explanation above

  • L. 629: you are saying the AUC is 0.5 so useless compare to D0- … Is the AUC actually 0.55 ? Is it really worse compare to the D0- from MELA which also seems to have a very mild discrimination (because it is hard to distinguish).
It's true, the performance of the D0- MELA is similar and the statement has been corrected.

  • Could you actually compare the ROCs ggH CP-even vs ggH CP-odd for the MELA D0- and your CP-odd node ?
Yes the performance of the D0- classifier in distinguishing among the ggH CP-even and CP-odd states is slightly better than the multi-class BDT ggH CP-even classifier, since the AUC(D0-) ~0.53 > AUC(ggH 0+) ~0.48.

  • Would there be any benefit in combining the 2 (what is the correlation between the 2 discriminants)?
Antonio The D0- MELA discriminant was found to be weakly correlated with the ggH CP-even classifier, with a correlation of less than 20% in all the samples, with a coefficient = 0.13 ggH SM, -0.16 for ggH CP-odd , -0.19 for inclusive background. Given the low correlation, in principle they could be combined, but we found out that adding CP-even as an additional discriminant leaves the sensitivity unvaried. In addition, the D0- MELA discriminant was found to have a moderate negative correlation of 45% with the dedicated ggH+2j CP-odd classifier for all processes, but both variables are already used as discriminants in the 3D categorization. The correlation plot have been added to the Hgg talks.

  1. Results
  • Fig. 24: Why is 2016PreVFP so much different from the others? In particular why does the impact of the systematics is so large for this dataset while it has a much lower statistics compared to 2017 and 2018 ?
This is not the case anymore in the latest LL scan, and there is a similar trend in the impact of the systematics across all datasets.

  • You are profiling fa3 from HVV as well as mu_qqH. Since the physics that affects fa3_ggH and and fa3_hvv can very different (like the top CP-odd coupling would affect only ggH) it would also be interesting to have the results with fa3_hvv constrained
Yes, we have produced the scan as requested by setting fa3_hvv to its SM value and added it to the AN.

  • Would you benefit in having a VBF category (even just based on the mjj) to constrain mu_qqH (even if fa3_hvv is floated)?
We initially investigated the possibility to perform a binning in mjj, [0; 300] ; [300: ∞[GeV to better constrain the VBF background process. However, we noticed that the mjj variable was ranked as a low discriminating variable in the multi-class BDT classifier both with (without VBF node), so we expect no improvement in sensitivity by having a mjj binning on fa3_ggH. Furthermore, we prefer to leave the possibility of constraining fa3_hvv from a VBF-enriched high-mass mjj bin for further work after pre-approval.

Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2024-05-16 - LindaFinco
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback