Questions and answers for Higgs → 2 photons anomalous couplings with Full Run2 dataset

Color code

For the answers, the following color code is used:

* green - comment answered * orange - work in progress * red - comment to be implemented * purple - Comment considered but we decided to do differently, but justifying the choice * pink . The question is left pending, awaiting response from the section's reference.

Documentation

CADI line: N/A yet

Analysis note: https://cms.cern.ch/iCMS/user/noteinfo?cmsnoteid=CMS%20AN-2022/049

CMS talk: N/A yet

Review

Comments from HiggsGammaGamma group conveners on AN v4 (29/3/2024)

* General comments: * I think it will be difficult for someone who did not follow the analysis from the beginning (ARC members, for instance) to understand how the many discriminants are defined and employed. A section (table? figure as Fig. 8 in HIG-20-007?) summarizing the discriminants definition and the final categorisation in each channel would help clarify the global pictures.

A table that summarizes all the discriminants is added in the AN (Table 10)

In the VH leptonic part sometimes it is not clear when you talk about the channel in which W and Z bosons decay to muons and electrons and when about the case in which they decay to neutrinos. Sometimes you use "VH leptonic" for both the decay to muons/electrons and to MET (which is in my opinion the correct thing to do, since neutrinos are leptons too), while sometimes you use it just to indicate the former. I think it would be useful to define a way to refer to these two sub-channels and use it all along the AN.

Solution: The channels will be WH lep, ZH lep and VH MET and together they will be called VH Leptonic.

A few different values of the total luminosity are used. I think (but I might be wrong, to be checked) that the correct value is 137.6 (36.3 + 41.5 + 59.8) fb-1. Changed in the text and in the VBF plots

Showing some validation plots for the input variables used in the different discriminants (in the VBF and VH lep channels) will be nice to see (you said it is on-going, so I am writing here just not to forget).

Added some validation plots to the analysis. The AC distribution are reweighted.

It would be interesting to show the impact of the different channels to the AC parameters measurement (this is I think also one of the points which are still work-in-progress). TO DO

* Line by line comments: *

L69-L70: This is no more true, right?

Right, Changed in the AN

L90: Shouldn't it be powheg + miNLO (if you use miNLO samples)? Fig 1-2: I think figures before the reweighting are only shown. Do you have the plots after reweighting? I remember you showed them in the past, if I am not mistaken.

Right, we use powheg + miNLO Changed in the AN . The plots are reweighted

L101: Is the reweighting done also for VH leptonic samples? If not, please clarify why.

The VH Lep team didn’t find any significant deviations between the LO JHUgen & NLO Powheg in the AC BDT shape that is trained on the LO samples. This is to be expected as none of the VHLep channels depend on the number of jets in the final state for reconstructing this state. There can be a small impact on the momentum of the vector boson but that is a “higher” order effect.

L103 (see also L430): In Chapter 6.2 the reweighting is not described, but instead it refers to this section. Please add the reweighting procedure for VH hadronic either here or in 6.2.

Add some lines and plots about it

L111: Which other mass points do you use? Do you use those in the training of all discriminants? I think it is not specified when you describe the different trainings.

The mass point used in the training is 125 GeV. Specified in the AN

L117: Please add for which production modes you have those AC samples and with which values of the different AC parameters they have been created.

Specified in the AN: The Monte Carlo data related to \fB = 0.5 for the WH process is missing, as well as the Monte Carlo data related to the WH process \fLZg = 1, 0.5. All the other Monte Carlo samples related to different anomalous couplings and production methods are included in the analysis.

Fig. 3 and the following ones: it seems to me that ggH and VBF f_a2=0.5 are both drawn with a black line. Probably it would be better to choose a different color.

Fixed

L205: Where are these two categories of jets used?

This definition of categories is used for the STXS categories dominated by ggH, which we leverage in the analysis.

Fig. 7: Are these distributions reweighted by powheg? What is the selection applied? The VBF preselection?

Fixed

L280: According to L268, omega has 5 variables, not 6. What is the 6th observable you refer to here?

In L268 the variables refer just to the VBF production process in the other Production and decay. Adjusted in the text

Fig. 9: Here you reported many MELA discriminants, but the only one used in the analysis is HVV D0- (with fa3=1), right? I would write it in the text, otherwise it is confusing. Do you explain why the other discriminants are not used?

Specified in the AN, moreover adding the table 10 makes it more clear.

Fig. 10: Which AC (BSM) hypothesis is tested here?

all the anomalous coupling belongs to the class VBF BSM. I adjust the text to clarify that.

In the VBF channel, did you study the correlation between the different discriminants (D_0- and the DNN discriminants)?

We haven't conducted an in-depth study on this matter, but even if there were a correlation, the analysis wouldn't be affected because we perform a simultaneous 3D scan to find the best boundaries.

Fig. 15: VH had here is SM, right? Is it the reweighted JHUGen or is it powheg? What is the selection applied to the data?

The VHhad samples in Fig.15 are POWHEG SM signal. The selection is pre-selection descried in Tab. 8

L429: Do you use any reweighting for the background simulated samples?

No because they where already generated with POWHEG as described in the section 2.2.1

L430-L431 (see also L103); "using the procedure described in section 2": actually in Section 2 there is written that the procedure is described here. Could you please add the details?

Added in the AN

Fig. 16: Do you understand why the GJet sample in the plot in the middle has an up-and-down distribution? Is it only due to MC sample statistics? VH SM is JHUGen reweighted by powheg?

These are indeed statistical fluctuations related since there are selection on the photon ID and number of jets, the stat. of MC for GJet is reduced. VH SM is the powheg simulation

L464-L470: How did you optimize the binning?

See section 7.3 about the binning optimization

Fig. 18: Here bkg is gamma+jet, gamma+gamma, ggH, VBF and ttH? If yes, why does it seem there is a lack of statistics (curves are not smooth)?

The curves are not smooth due to statistical fluctuations in the GJet samples (for high rejection, when there are less GJet the fluctuation become smaller)

L482: Is the requirement of opposite charge not asked for? Why?

Adding opposite charges requirement could give higher S/B for ZH leptonic decay signal extraction at upstream stage before going through BDT categorization. However, since the particle's charge could be mis-assigned, the signal efficiency could be reduced. Considering the ZH leptonic decay signal has less statistics, the analysis directly took two leading charged leptons pass ID/Iso and 60 GeV < M_ll < 120 GeV. So we have more statistics for BDT training and then let BDT drive the S/B for both data and MC.

Added in the text some clarification

L489: is the Hgg preselection not applied to the ZH case?

Yes they are. ADDED in the text “Di-photon pre-selection which mimic the diphoton trigger criterion NOTE: There is a difference between AN2019_259 preselections and the vanilla flashgg version.

L496: Do you use MC for gamma-gamma and gamma-jet in the non-MET categories?

Yes we use the gamma-gamma and gamma-jet for the WH-lep and ZH-lep MET Tag uses data-driven fake photon background

From L498: I think it would be clearer if you introduced the general description of the MET category in the first paragraph (from L472 to L480): you first present the three VH leptonic cases and then you describe the three preselections applied. Please also clarify here what you mean by "VH leptonic" and be coherent along the text: as I wrote in the general comments, sometimes you refer to it only for the muon and electron decays, some others you also include the MET category.

Yes, we will include the suggestion. Thanks

TODO: Add a general description of the VH MET in first paragraph TODO: Fix VH Leptonic name discrepancy

L505: Which both channels? Z(MET)H and W(MET)H? One has a lepton in addition, so I think it is not correct to say that they have the same final state. You can say that both have a large fraction of MET and, because of this, you consider them together.

Yes, Fixed in the text

L526: Why don't you consider other H production modes in the training?

Some Higgs production processes such as the ggH, VBF and ttH not targeted by the VH Leptonic categories are also regarded as background during the STXS MVA training, therefore for SM higgs v.s. AC higgs training maybe we should only consider VH leptonic after STXS MVA selection.

Fig. 25-26: Do you have plots showing the validation of these input variables with data?

1)Yes we do. Omitted them for simplicity's sake.

Does f_a3 in the legend mean f_a3 = 1? 2) That is correct! Clarification in the AN: Add fa3=1 instead of fa3. Do the same for fa2 and fL1 Link to sample description here https://twiki.cern.ch/twiki/bin/viewauth/CMS/Run2MC2017ProductionforHiggsProperties#JHUGen_WH_production_JHUGen_H_ZZ

L560-L574: Specify which one is used in the current results.

Keep the hyperparameters used.

Fig. 28-29: Plot in the middle: What does it mean "fa2 = 2"? Shouldn't it be at max 1?

This is a typo, thanks for pointing it out

Fig. 35-36: The name of some variables is cut, please fix it.

Fixed

L626-L627: Are the boundaries scanned simultaneously?

Yes, Added in the AN

L629: Please define Si^a3. Is it the fraction of VBF H events where H is produced by AC interactions in case f_a3 = 0.07? If yes, do you compute sigma_stat for different hypotheses of f_a3 or only for f_a3 = 0.07? The sigma_stat obtained in Fig. 39 and 40, with other AC parameters considered, is used anywhere in the analysis?

Si^a3 is the number of BSM-event with f_a3 = 1. Ajusted in the text. sigma is computed just for f_a3 = 0.07. No,sigma is just our benchmark to understand the optimal boundaries and to see that the behaviour is quite the same for all the different AC

L635: In line 629 it is said that only the AC component is considered in Si, while here it seems you also consider the non-AC fraction of VBF H events. Could you please clarify?

Yes it was an incorrect definition of Si, Clarified in the AN

L654: Please write that Fig. 39 is done considering as signal AC VBF component with f_lambda1 = 0.35 (probably I would switch figure 39 with 38, which describes the case f_a3 = 0.07 as explained in 628-629).

Correct and Added to the AN

L659: Do you use different boundaries for different BSM couplings? Or the boundaries reported in 651-653 are the ones used in the whole analysis? Please clarify it in the text.

The boundaries are the same for all the different AC. But as you can see from figure 40 /41 the boundaries found works well also for the others AC. Clarified in the AN

Fig 41: Same as in Fig. 9: do you use all these discriminants anywhere? Aren't you just using the top-left one? if you do not consider the others, please clarify it in the text (or remove those distributions), otherwise I think it is confusing.

Yes, But the good agreement between data and Monte Carlo in the various distributions is used as a sanity check . Notice that adding the Table 10 to the AN makes the concept more clear

L714: The category optimization is different in all the three channels (VBF, VH had, VH lep) and probably it is too late to define a common strategy, but maybe at least the mgg range where the signal is defined could be aligned? I am wondering just how we can justify this during the ARC review, if anyone asks.

It's not very convenient to change the Mgg range, but note that by taking a signal region around 3 sigma or 5 sigma, the difference is less than 1%

Table 10: I think there is something wrong in how this table is organized.

Correct in the AN

Table 11, 12, 13: Too many significant digits for the number of categories, can you fix it? Tables adjusted and made understandable.

Table 11: It is confusing to have an entry om VH MET here in this table. I would add these numbers later, when the VH MET optimization is described. Created Separate tables from VH-MET and WH-lep + ZH-lep

L771: To make it clearer I would write "Fig. 48 (WH leptonic) and Fig. 50 (ZH leptonic)". DONE

772-773: What does "the derived categories for fa3 was applied on the anomalous samples for fa2 and fΛ1" mean? Do you use the categories optimized while considering as signal the AC VH process with fa3 =1 also when you scan the other AC parameters?

We have trained the BDT and have derived the boundaries optimized for each AC parameter separately, however for the combined analysis with VBF & VH-Had, due to added complexity and the lack of time, it was decided to use the AC BDT & boundaries derived for the fa3 sample for all the AC parameters for the PAS. We are carefully checking the loss in sensitivity on fa2 and fL1 as a result of this choice and may choose to revert to the appropriate strategy

Table 12-13: Please clarify in the text what you mean by "Bkg Evt" (SM non-Higgs background?) and "STXS Bkg" (SM Higgs background?).

STXS Bkg :: SM non-Higgs background Bkg Evt :: SM Higgs background from JHU Gen

Tables have been rewritten to make them clearer

Fig 49, 51: Is "BKG ANOM" the SM VH sample? If yes, I would say so (otherwise I think it is really confusing).

Added description of BKG ANOM to the caption

Why is the region STXS MVA > 0.75 not populated?

That region is dominated by SM non-Higgs background and is ignored for computational efficiency. See Fig. 47

Do you consider the contamination of other H SM production mechanisms in the training and optimization?

We haven’t considered other production mechanisms. Similar to L526 question Check was done by adding non-Higgs background to the category optimization procedure as background and we found that the boundaries didn’t not change.

7.4.2: I think this part is a bit confusing. Some of the information described from L789 to L802 seem to be repeated in the paragraph below. It is not clear to me when you talk about the training and when you talk about the optimization procedure, or when you talk about the STXS BDT or the ANOM one.

To make it more clear we Used the same definition with wh-lep and zh-lep tag part.

L795: is the background taken from MC? Please clarify it in the text.

Added some clarification in the AN

L807-L813: Do I understand correctly that you optimize the STXS BDT boundary by also considering as signal the VH AC processes?

Yes, after define the lowest STXS BDT boundary, the AC process will be consider to check the change of other boundaries.

L826-L827: Which procedure? To obtain the number of signal and background events? Fixed in the text.

L828-L830 and L819-L821: Don't these two sentences say the same thing? Or is it something else? Please clarify, the text is confusing. Fixed in the text.

Figure 54: It is not clear what signal sample is used here. fa2, fa3 =1 (top) and flambda1 = 1 (bottom), but with the STXS MVA boundaries optimize using as signal fa3 =1? Adding the description to the plots’ discrimination, here the background means SM Higgs, and the background means non-Higgs events.

Fig. 54, 55 etc.: It would be good if these Figures were self explanatory, in the sense that they should have written in the plot what is signal and what is background, for instance. Modify the legend of plots by label samples, and adding information in the figures’ descriptions.

L841-L849: This paragraph is quite confusing. Probably you have to force Table 15 and 16 to be right after L843.

Corrected in the AN

Table 15-16: In the caption you write "expected number of signa events", but here instead you show the number of ggH or top events, which are not the signal of this analysis, right? Please define what S and B are here (same for Table 17 and 18). Are these numbers compatible with the STXS analysis?

Corrected in the AN , yes are comatible

Table 17-18: It would help if the analysis category names a la STXS would be also described in these tables by the correspondent discriminant boundaries defined in the optimization section, otherwise it is difficult to understand what category is what.

WORK in progress , We have already considered a nomenclature for the categories that is more understandable, and we will soon change it in both the plots and the tables to make the analysis more comprehensible.

L860: "five non-empty": this should be updated with the VH cat also included, no?

Right, Corrected in the AN

L882-L883: If the mass is fixed to its best measurement in the fit, why does one need to get the signal parametrization obtained using the 120 and 130 GeV samples?

The reasons for choosing this strategy are primarily twofold. Firstly, there is no Monte Carlo sample available with a mass at 125.38 GeV. Secondly, the mass is free to vary within its uncertainty. In fact, the mass value is a systematic of the fit

L1134: Here you talk about mu_ggH, qqH, VH and top but later you consider mu_F and mu_V (but you anyway get expected estimates for the former mus): could you please clarify?

We want to see the two types of anomalous couplings, either varying hvv or Hgg (i.e., muf), coherently. The plot related to the different signal strength for production modes are sanity checks

Fig 73 is missing, please fix it.

Right, Corrected in the AN

L1148: "Five analysis categories": this is only for the VBF channel, no? It would be nice to add figures also for the VH channels.

Right, Corrected in the AN

L1160, 1174: Please fix the figure reference.

Right, Corrected in the AN

L1161-L1162: Do you know why the expected uncertainty of mu_ggH and mu_qqH is sensibly lower than the one in Fig. 16 of HIG-19-015?

The issue was that we were using the option ---fastScan in combination, which fixed the unprofiled POIs to their initial values. This significantly improved the result's sensitivity. However, during these checks, we noticed some systematic effects to add, which have been incorporated.

  • profile1D_syst_xsec_r_VBFxsec.png:
    profile1D_syst_xsec_r_VBFxsec.png

  • profile1D_syst_xsec_r_VHxsec.png:
    profile1D_syst_xsec_r_VHxsec.png

  • profile1D_syst_xsec_r_ggHxsec.png:
    profile1D_syst_xsec_r_ggHxsec.png

  • profile1D_syst_xsec_r_topxsec.png:
    profile1D_syst_xsec_r_topxsec.png

L1161-L1164: Why are the central values different with respect to the values in Fig. 75?

The strategies used to fit in the two plots are quite different that's why the numbers are different. But by correcting the error on the Likelihood scan (by removing --fastScan) the discrepancies decrease. To avoid confusion about two different numbers we remove the fit result from the impacts plots

L1169-L1170 "using the VBF phase space": this is only true for the VBF channel, right? Here you show the combination of VBF+VH, no?

Right, Corrected in the AN

10.2: I would repeat here that in each fit the values of the other anomalous coupling parameters are set to zero.

Added in the AN

Would it be possible to produce the fit of f_a3 when f_a3^ggH is left unconstrained? This is the approach followed in HIG-20-007 (H->tau tau AC analysis) and in the ggH(->gamma gamma) channel.

This is already done. We will add the details in the specific AN when merging the analysis with that of the ggH, so that we can introduce the terms correctly."

Fig. 76: would it be possible to write the value of the parameters in the plots multiplied by 10^-4, as in Table 19?

Added

Fig. 77: Could you please explain in the text why most of the systematics have asymmetric impacts? Is it due only to the fact that you get the parameters from 0 to 1 and then you symmetrize it later on with the phase?

We have noticed this and are conducting a study to understand the reasons behind it

Additional Work in WH-lep and ZH-lep TODO: Add fa2 and fL1 boundary results to the table for the WH and ZH Lep TODO: Add fa2 and fL1 boundary plots to appendix

-- EmanueleDiMarco - 2024-04-02

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2024-05-23 - FedericaDeRiggi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback