Q&A for VBS Semileptonic

Color code for answers
Green led - Comment is acknowledged and answered
Orange led - Authors are working on answering the comment
Red led - Comment requires further work to be addressed or need attention from the internal reviewer regarding a specific issue
Blue led - We do not agree with the comment and arguments are given

Documentation and additional material

Event display:

Additional checks during the review:

  • Residual bias checks for DeltaEta VBS and Zlep: slides
  • DeltaEta VBS importance study: slides
  • GEN level checks for the signal (Pythia dipole recoil and Herwig comparison): slides

Comments from the journal referees

2nd round of comments from the journal referees

Comments before FR

FR review by Dick Loveless

  • The Abstract is too long. The CMS limit is 200 words, and a rough estimate indicates you have around 220 words. Please shorten the Abstract a bit.

Orange led Removed a phrase about the signal discriminator. To reduce it a little more one can remove from the abstract the definition of the resolved and boosted category. We can discuss this further during the FR.

  • You need to prepare a HEPData file for all the plots that give the data numbers. Please work with Claude Charlot for details. For the text of the paper you must add the following sentence at the end of the Summary (or somewhere relevant): "Tabulated results are provided in the HEPData record for this analysis [nn].". Then add the reference: "[nn] HEPData record for this measurement, 2021, doi:10.17182/hepdata.NNNN".

Orange led Thanks to Claude we opened a new HEPData entry and we are preparing the submission. The citation has been added in the paper text.

  • Surely ATLAS has published a paper describing the ATLAS detector. Please add the reference. Green led Added

  • Is there a reference to "grooming algorithm"?

Green led It was still referred to the SoftDrop algorithm. The two phrases have been changed to make it clear they are referring to the same algorithm.

  • lines 213,214: Can you define "cross-entropy"?
Blue led The binary cross-entropy loss is a standard loss function used for binary classification. We don't feel like it is needed to explicitily define it. We have included a reference for its definition.

  • Ref.67: Only the first word of the title should be capitalized. The correct reference is journal = "Eur. Phys. J. C", doi = "10.1140/epjc/s10052-021-09538-2", archivePrefix = "arXiv", sprint = "2104.01927".
Orange led Following a similar comment in CWR, and the review during FR, we have updated all the results and plots to the post-paper 2016 luminosity value and uncertainty. Only the postfit plot of the mjjVBS variable is still not updated: we will have the plot ready shortly after the FR meeting (we expect negligible impact on the plot).

FR review by Dezso Horvath

  • L1: "The evidence for electroweak (EW) vector boson scattering in the semileptonic decay lνqq channel plus two jets is reported." Decay of what? A pair of weak bosons? Are there four jets or two, two weak bosons or three? Of course, this will be clear from the paper, but the first sentence is totally confusing. It should sound better as "An evidence is reported for electroweak (EW) vector boson scattering in the semileptonic decay channel lνqq of two weak bosons WV (V=W or Z) produced in association with two parton jets."

Green led Agreed, the sentence has been adjusted as suggested.

  • L7: "hadronic decay of a W/Z boson. " => "hadronic decay of a W/Z (with a common notation V) boson. " (V is used later)

Green led The Definition of V has been already included in the first sentence.

  • L13: There are confusing parentheses, I propose to put the results in a list like : "The measured and expected EW WV production cross sections are 1.90+0.54−0.47 pb and 2.23+0.08− 0.11 (scale) ± 0.05(PDF) pb, respectively, in a ..." PDF is not defined here, see L334 of the Summary. Green led Done

  • L19: Is it not a misnomer to call "QCD production" the WV pair coming from VBS of quarks or from gluon fusion? They are not really produced by QCD, just associated with QCD processes. I propose "diboson QCD production" => "QCD-associated diboson production", also in LL 45, 282, and 339. The abbreviation "QCD WV" should be OK, of course, for the associated production.
Green led Agreed

  • L13: What do you mean by "EW coupling α_EW"? There are electromagnetic, weak, SU(2) and U(1) couplings in the EW sector of the SM. Please define α_EW, e.g. as the neutral weak current coupling if you mean that one. Green led Agreed

  • L21: "lepton l" => "lepton $\ell$" Green led Agreed

  • L45: V for W and Z is used all over the paper, but defined in the Summary only. Please define it in L45 like e.g. "W and Z boson" => "W and Z boson (together V boson)", and then you can write V instead of repeating "W and Z" as well.

Green led In L28 in the introduction the use of V for vector boson is already introduced.

  • L74: "α_S α^5" Is this correct? Which α is it, α_em?

Green led Corrected

  • L120: "AK4 jets are considered..." What about AK8 ones? Maybe this sentence should be joined to the next paragraph where AK8 is treated.

Green led Added Ak8 and adjusted the phrase.

  • L157: "the transverse mass of the leptonically decaying W is required to be MWT < 185 GeV" Why so large, more than twice the W mass? In L166 you select mV< 115 GeV.

Blue led The cut at MtW < 185 has been imposed as a really loose selection, to exclude a region containing only backgroun. We preferred to avoid a stricter selection on this observable in the signal region, leaving to the MVA discriminator a larger phase space to analyze.

  • Fig. 3: Too pale colours, the curves are hard to decypher. Also, the arrangement of Figs. 2 and 4-5 is much better, with thick continuous lines.

Blue led During previous review we decided to use different line style in order to make the plot readable in B&W. Increased now the line thickness and choosen better colors.

  • L245: "a nuisance parameter that morphs" This verb is used mostly in biology and computer animation for gradual, smooth transfomation from one definite form into another. I should rather use a simpler "changes" here. If you insist on "morphing" then you should at least add "... processes into each other, or"

Green led The use of “morph” comes from the jargon used in the Combine package for the implementation of the systematic variation. We agree that a simple "changes" is more clear in this sentence.

  • L282: Is it not a misnomer to call "QCD production" the WV coming from VBS of quarks or from gluon fusion? They are not really produced by QCD, just associated with QCD processes. I propose "diboson QCD production" => "QCD-associated diboson production". We can omit " (QCD-W Z)" as it was already defined in L45 and not used further.

Green ledAgreed and changed.

  • Table 2: It is a bit confusing to have the total experimental and theoretical uncertainties first and the various contributions after. It should look better to leave those places empty and have a "Total" with a separator line underneath, below the contributions.

Green led Agreed

FR review by Philippe Bloch

  • It is pretty awful to call semileptonic the decay of two different particles, one of them going to leptons, the other to hadrons. I know that this was used by ATLAS in their paper [13]. Is it a good reason to continue? In fact, it is only used 3 times in the paper: title, introduction, summary. We had a similar discussion for a recent B2G paper (20-007) where the Higgs would decay into two Ws. And eventually, the wording semileptonic was removed from the final version at the request of Bob Brown and the FR board.

Green led We removed the semileptonic word from the paper, replacing it with an explicit definition of the decay mode. The title seems to us a little long, but we can discuss about this further during the FR.

  • Abstract: Is it necessary to give the technique used (machine learning et..) in the Abstract? Green led Agreed, removed to reduce the length of the abstract.

  • L48 the term VBF has not been introduced. For the first occurence, one should write Vector Boson Fusion. Green led Agreed

  • It think it would be good to have a uniform naming of jets Green led Agreed

  • For example, in Table 1 one has both leading Vhad jet and leading jet from Vhad. Green led Agreed

  • L156 leading VBS tag jet, but in Table 1 VBS leading tag jet. Green ledAgreed, everything is consistent now.

  • L183 and 185 trailing tag jet, but in Table 1 VBS trailing tag jet Green led Agreed

  • L188 Not sure I understand perfectly the closer to or farther. Could you be more precise?

Green led Made explicity the subdivision of the W+jets control region: "\mvhad closer to (i.e $[50,65] U[105,150] \GeV$ for the resolved category) or farther (i.e. $[40, 50] U [150,+\infty] \GeV$) from the V resonance..."

  • L229 normalised distributions OF THE DNN DISCRIMINATOR .. Green led Agreed

  • Fig.3. Could one use different line styles for the yellow and the pink lines ? Impossible to distinguish in b&w. Green led Improved

  • Fig.5. The Tot uncertainty band is difficult to see in the lower panels (actually even more on my screen than on the printer ..). It is very important to see that at large DNN the experimental points are above the dashed area. Green led Improved

FR review by Chiara Rovelli

  • Abstract, L12: "EW WV production" => "V" has not been defined yet. I propose to rephrase as "EW WW or WZ production..." or to define "V" a few lines before, e.g. in L7 where you could write "... hadronc decay of a W/Z (V) boson".

Green led Improved following a similar suggestion from Dezso.

  • Abstract, L12-13: I found the parentheses confusing. In my opinion this sentence would be clearer if rewritten e.g. as "the measured and expected EW WV production cross sections are 1.90+0.54-0.47 pb and 2.23... " Green led Agreed

  • L124: please add a reference for the grooming algorithm Improved clarity:

Blue led the grooming algorithm was referred to the Soft Drop algorithm described in the following lines. Improved the phrasing.

  • L288: please add a reference for the PDF uncertainty estimate

Green led Included clearer explanation and reference.

  • L309: I'd rephrase this sentence separating in a clearer way measured and expected values, e.g. "the measured and expected cross-sections are respectively 16.6+3.4-2.9 pb and 16.9+2.9-2.1(scale) +- 0.5(PDF)..." Green led Agreed

  • L334: if I'm not wrong, this is the first time the "(PDF)" notation is explained. This explanation should be moved much earlier, when it's used for the first time

Blue led It is also introducted in the Systematic uncertainties sections.

  • Fig5 bottom: can you mark in a clearer way the uncertainty band? Green led Improved

FR review by Katarzyna Wichmann

  • L275: please explain in the text which variation is taken for the
scale uncertainty (i assume the largest, not a sum?)

Green led The largest variation is taken: added in the text.

  • L288: please add how the PDF uncertainties are calculated or at least add a reference to the "method paper".

Green led Included clearer phrase and reference.

  • figure 2: please make the legend larger; please place data as first in the legend; does "Syst." in the legend refers to the data? if yes, then it should be placed directly under the data in the legend; please make "MjjVBS" larger on the x axis; please add to the caption info on the vertical bars on the data points and the grey band.

Green led Improved the plot

  • figure 3: please make the legend larger; is it possible to choose more "bold" colors? these are quite pale; in any case, the two green and two blue lines are hard to tell apart - please change one to a different color.

Green led Improved the plot

  • figure 4 and 5top: please make the legend larger; please place data as first in the legend; does "Syst." in the legend refers to the data? if yes, then it should be placed directly under the data in the legend; please make "DNN (DNN boosted)" larger on the x axis; please add to the caption info on the vertical bars on the data points.

Green led Improved the plot

  • figure 5bottom: please make "DNN boosted" larger on the x axis; please add to the caption info on the vertical bars on the data points; y-axis label: i suggest to put parenthesis around (Data - Bkg).

Green led Improved the plot

After CWR

ARC comments

  1. There were several comments on the title. You state correctly that this will be discussed and possibly changed in the FR. I think it is still a good idea in your responses to give your new proposed title (the one in v10). I think this title will be acceptable to most, although it may be tweaked further at the FR. Green led

  1. There were some questions about the treatment of W->tau nu and Z->tau tau decays in the signal. You answered these comments well, but the fact that it came up multiple times makes me think it should be clarified in the paper as well, even though it is a very small ffect. You could add a sentence at the end of the paragraph ending on line 60 in draft 10, for example. Green led Added.

CCLE comments

  • Title: I find "semileptonic l nu qq " not very good, since this is the final state, not the semileptonic decay. What about writing " ... in semileptonic decay with l nu qq final states ..." ? Green led

  • Abstract: 6th line " ... and a signature consistent with the hadronic decay of a W/Z boson." Green led

  • 42 " .. in more detail in Sec.~5),... " Green led
  • 52 "... in perturbatuive quantum chromodynamics (QCD)." Green led
  • 53 Question: weighted to what ?

Green led Those samples are corrected using gen-level information using common recipes. Phrase changed to read as: "The tt component [..] are also weighted using generator level information to improve the agreement to data of the simulated pt ...[26-28]" More details about the correction method are given in the references.

  • 55 One should say, what the brackets mean: " ... process (the brackets give the decay modes), ... Green led
  • 69 "... MC sample. The HERWIG .... " Green led
  • 107 " ... analysis by triggers for isolated single leptons with ... " Green led
  • 129 " ... 0.45 and a groomed AK8 jet ... " Green led
  • 131 " .... vector bosons, WV, in association ... " Green led
  • 149 " ... VBS tag jet selection ... " Green led
  • 150 " ... is chosen as the ... " Green led
  • eq 1 Question: should there be also a vector symbol above "l" ? or should it be p_T(\vec{l}) in delta phi ?

Green led I changed it to p_T(\vec{l}) as suggested for consistency.

  • 151 is "ML-based" defined ? I would just write it out " ... machine learning based ... " Green led
  • Tabe 1: Would it make sense to order the items according to ranking ?

Green led Since for the two categories the ordering is not the same, we think that it would be better to keep the current logical order based on observable type (lepton, VBS jet, Vhad jet observables).

  • 279 "strength ... " Green led
  • 281 ".. signal strength ... " Green led
  • 303 "where .... , respectively with an expectation of 1+- ..." Green led
  • 332 " .. where PDF is the uncertainty coming from the parton distribution function." Green led
  • 340 " ... with an expectation of 1+- .. , is measured ..." Green led

  • References: there should be no "no. .." in the bibliography in lines:473, 486, 489, 492, 495, 499, 508, 524 Green led

CWR

Comments from Andrea Carlo Marini (cds)

  • L20-22. What about W->tau->lep ? How it is accounted? As signal or background?

Green led The W decay into taus is considered part of the signal, and W decays in all leptonic final states are generated. However the analysis has been tuned to look for electron and muon final state

  • L57-60. Is this statement true differentially in the bins of the DNN discriminator? Isn't a bias of 3% (over a <20% uncertainty) a bit large?

Green led We checked differentially in the DNN bins and the contribution in the most sensitive bins is ~2% both in the resolved and boosted category. The 3% reported contribution is related to the inclusive contribution of the interference in the signal region. Boosted plot, Resolved plot

  • Table 1. Did you apply the same QGL weights and/or uncertainties to pythia and herwig?

Green led Yes we applied the same QGL weights for all the samples, computed from Pythia based W+jets and Top MC samples. However the Herwig parton shower has been used only for the VBF-V contribution, which is rather small.

  • ln 230. I think you should describe more and clearer how the fit is done, describing how the uncertainties are incorporated and how uncertainties on the parameter of interests are derived.

Green led The fit strategy is described in the Sec5 (L184) and the uncertainty incorporation at L194. Sec6 then gives a full description of the uncertainties included in the analysis and their effect on the parameter of interest.

  • ln 231. did you add the uncertainties on the prediction for VBS-QCD when you present the result of VBS-EW subtracted?

Green led Yes, when the EW-only fit is performed, scale uncertainties on the QCD-WV process are inserted in the fit, including their rather large normalization effect (~25% in the signal region).

  • Fig.6 do you have a 3% uncertainty post fit in the most sensitive bin (bottom row)? how much are all uncertainties constrained?

Green led The systematic uncertainty band in the data-background plot (bottom row) is related only to the background estimation. It corresponds to an ~8% uncertainty on the most sensitive bin as in the top plot. We don’t observe significant constraints in the analysis uncertainties.

  • What uncertainties do you have in the extrapolation from the CR to the SR of the MC predictions?

Green led About the W+jets estimation: the W+jets MC sample has been splitted in sub-samples with rate parameters for each component correlated in all CR and SR regions. The W+jets CR is used directly in the fit to constrain the normalization of each sub-sample. No additional extrapolation uncertainties have been added since it has been checked that this data-driven method corrects in a satisfactory way the data/MC discrepancies both in the W+jets CR and in the signal region. Some uncertainties like the b-tagging scale factor ones correlate closely the CR with the SR since a b-veto is imposed in the SR.

  • Why the bands are qualitatively different from the top to the bottom rows (10% vs 3%)?

Green led Because in the top row the uncertainty bars are on the ratio data/MC, then showing relative uncertainty, while in the bottom row it's data-MC, then showing absolute uncertainty.

Comments from Albert De Roeck (cds)

  • line 54: "dipole recoil scheme is used" say for what it is used (for PS?) and if there was a special reason for using it, then give it in the text. Also a reference to the dipole recoil scheme would be useful to add.

Green led The reason, has given in the text, is “to improve the description of the additional jet emissions [27, 28]”.

  • line 67: Add "The CMS detector" in the title

Green led Done.

  • line 93: We do not say anywhere what triggers we used in this paper? I assume isolated lepton triggers but I am not sure we explicitly spelled that out anywhere, as we should (unless I missed it) I would expect we would discuss that here before going to the offline objects.

Green led Added triggers definition in the text.

  • line 132: give the efficiency/mistag probability for b-tagging with the loose working point

Green led The efficiency of the b-tagging loose working point is 85% for b-matched jets, the mistag probability for non-b jets is 20%. Added in the text.

  • Figure 2 is nice but also simple: the only difference is in the boosted/resolved definition, so I don't think this figures is really essential for the paper.

Green led We have to decide to remove or not Figure 2 (also ARC chair proposed it before CWR).

  • line 148: I assume contributions from weak processes with real leptons are subtracted by MC from this QCD enriched sample?

Green led Added in the text: “The contribution from EW processes with a real lepton is subtracted from this QCD enriched phase space region by means of W+jets and DY MC events“.

  • line 175: "...explainable machine learning.." is that not something used mostly in life science? What are the advantages to use it here for our problem at hand? Verifying "the matching with our physics intuition"?

Green led The SHAP technique is a general-purpose method to explain a ML model behaviour. In this analysis it has been used indeed to verify the importance of the input variables and their interplay in defining the signal rich region used in the fit. It verifies our physics intuition giving a lot of importance to the mjj and Zlep variables.

  • line 214: by how much are the unclustered energies changed for the systematic uncertainty test?

Green led They have been changed according to the JETMET recipe [https://twiki.cern.ch/twiki/bin/viewauth/CMS/MissingETRun2Corrections#Uncertainty_related_to_Uncluster] and their impact on the signal strength is < 0.3%

  • line 215: pile-up: systematic via the change in cross section as usual? Be more definite…

Green led Added to the text: the uncertainty on the pileup is assessed on all the relevant MC samples by varying the minimum bias cross section used to generate the pileup distribution by $\pm 1\sigma$

  • line 225: say in 1/2 sentence how you include the parton shower uncertainty or give a reference. Otherwise this is not very useful information.

Green led Added to the text: by using the weights corresponding to variations of $\alpha_S^{ISR}$ and $\alpha_S^{FST}$ computed by the parton shower programs

  • line 226-228: same for these statements, although here the effect is negligible so one can probably get away with this somewhat vague statement as is now.
Green led Indeed, we did not expand this section since the effect is <1% level on the measurement since it is not included for the main background

Comments from Greg Landsberg (cds)

  • Title: Search for vector boson scattering in semilapetonic ℓνqq final states in proton-proton collisions at s√=13 TeV. [We do not put the experiment name in the title or refer to Run 1 or Run 2!]

Orange led We propose to change the title to "Evidence for vector boson scattering in semileptonic decays with lnu qq final states in proton-proton collisions at \sqrt{s}=13\TeV". We will discuss further the title in the final reading, since different CWR comments offer various alternatives.

  • L2: please, give paper references to the Higgs boson discovery, which are the two PLBs and the CMS JHEP papers, and not your Ref. [1]!

Green led Done

  • LL5-6: the sentence "Particular interest ..." is completely unnecessary here, as you only talk about the Higgs sector; please delete it as it only breaks the flow of the text

. Green led Agreed , the sentence has been deleted.

  • Section 2: mention the PDF sets and tunes used in the simulation.

Green led Done

  • LL41-42: the Wγ and Zγ production is not radiative! Please rephrase properly: W and Z boson production in association with a photon (Wγ and Zγ)

Green led Done

  • L54: give full PYTHIA version here, v8.2xy,

Green led Done

  • L55: ditto for HERWIG++.

Green led Done

  • L67: Event reconstruction, selection, and categorization.

Green led Done

  • LL68-75: the detector description does not belong to this section. Please, move it to a separate section before Section 3.

Green led A separate detector description section has been added.

  • LL86-92: move the trigger description to the detector section as well.

Green led Done

  • L126: ... and the transverse mass ... [transverse mass is not a Lorentz invariant!].

Green led Done

  • Figure 2: W+jets CR; Signal region; Top quark CR [twice, in the lower row].

Green led The figure will be probably removed.

  • L147: define ΔR here.

Green led Done

  • LL174-176: how exactly do you test that the dependence on the input variables matches physics intuition. What does intuition tell you about this dependence? Please, expand with some examples.

Green led The SHAP technique is a general-purpose method to explain a ML model behaviour. In this analysis it has been used indeed to verify the importance of the input variables and their interplay in defining the signal rich region used in the fit. It verifies our physics intuition giving a lot of importance to the mjj and Zlep variables. The text has been improved.

  • LL184-185: the sentence makes little sense as the "SM signal strength" is 1 by definition, so there is nothing to extract; please rephrase properly: "The target of the analysis is the extraction of the WV VBS production signal strength and significance ..."

Green led Agreed

  • Figure 3: remove "Preliminary" from the plots; the upper legend should say "137 fb−1(13 TeV)", per CMS Style. The x axis label should match the variable introduced in the text: " mVBSjjj (GeV)". The legend key for the signal should read "VBS-W(ℓν)V(jj)". The other entries should read: "top quark" and "Nonprompt". Finally, the VBS-ZllVjj background was not introduced at all and should be added to Section 2. It shoudl also be typeset as "VBS-Z(ℓℓ)V(jj)". Make the shading on the ratio plots darker.

Green led Figures improved. Added VBS ZV background definition in section 2.

  • Figures 4-5: remove "Preliminary" from the plots; the upper legend should say "137 fb−1 (13 TeV)", per CMS Style. The legend key for the signal should read "VBS-W(ℓν)V(jj)". The other entries should read: "top quark" and "Nonprompt". Finally, the VBS-ZllVjj background was not introduced at all and should be added to Section 2. It shoudl also be typeset as "VBS-Z(ℓℓ)V(jj)". Make the shading on the ratio plots darker for Fig. 5.

Green led Figures improved. Added VBS ZV background definition in section 2.

  • L213: the term "uncertainty in the residual pmissT " is incomprehensible for an outside reader * you probably meant the uncertainty in pmiss T due to unclustered energy, so you need to explain this properly in the paper.

Green led Improved the text .

  • LL217,220,222: the renormalization and factorization scales are not "QCD scales"; they are scales of the QCD RGE evolution, which is too detailed for the paper; just drop "QCD" as misleading in all three occurrences.

Green led Done

  • Figure 6: remove "Preliminary" from the plots; the upper legend should say "137 fb−1 (13 TeV)", per CMS Style. The legend key for the signal should read "VBS-W(ℓν)V(jj)". The other entries should read: "top quark" and "Nonprompt". Finally, the VBS-ZllVjj background was not introduced at all and should be added to Section 2. It shoudl also be typeset as "VBS-Z(ℓℓ)V(jj)". Make the size of the axis labels similar between the upper and lower plots; in the lower panels capitalize "Syst. unc.".

Green led Figures improved.

  • Figure 7: the axis labels should match the text:μEWandμQCD TODO (plots)

Green led Done

Comments from Joscha Knolle (cds)

- You use the pre-paper integrated luminosity of 137/fb but the post-paper luminosity uncertainty of 1.2% (for 2016), where “paper” refers to CMS-LUM-17-003. Either use 36.3/fb (138/fb) with an uncertainty of 1.2% (1.6%), or 35.9/fb (137/fb) with an uncertainty of 2.5% (1.8%), for 2016 (Run-2).

Green led Agreed. We corrected the uncertainty for total Run 2 to be 1.8%, since we used the pre-paper uncertainties in the analysis. Text: "The integrated luminosities of the 2016, 2017, and 2018 data-taking periods are individually known with uncertainties in the 2.3--2.5\% range~\cite{...},while the total Run~II (2016--2018) integrated luminosity has an uncertainty of 1.8\%, the improvement in precision reflecting the (uncorrelated) time evolution of some systematic effects."

- Line 198: Please add citations for CMS-PAS-LUM-17-004, CMS-PAS-LUM-18-002.

Green led Done

Comments from Francisco Matorras (statcom) (cds)

  • One thing I’d like to ask is about the use of the signal strength as central result. I understand it is useful to quickly check the compatibility with SM, but it is not very useful if you do not state very clearly which is sigma_sm. I would recommend presenting in the abstract and summary the cross section (or both cross section and mu). Then, how do you treat the sigma_sm errors? Strictly speaking your mu should include those, while your total cross section (and your significance for having a signal above zero) should not. I’m also curious to understand why your error in eq 2 is asymmetric, while that on L242 is symmetric. Why having only one digit in this error?

Green ledWe have added the cross-section in the abstract (in the summary it is already fully included). The errors on the sigma_SM are not included for the total cross-section and significance calculation, but are included in the fit for the signal strenght. We have added one digit more to the cross_section result (and asymmetric error).

  • Can you get a significance on the 2D fit? If the departures from SM are expected to be unrelated for the two contributions, this seems to be the most relevant number.

Blue ledThe EW and QCD WV processes are considered unrelated only at the leading order of the cross-section computation and they are tightly linked by higher order EW and QCD corrections. This analysis is targetting the EW WV process and the acceptance for the QCD WV one is rather small due to the mjjVBS and deltaVBS cuts. Therefore our main goal is the extraction of the 1D significance for the EW WV process, keeping the QCD WV strength to the SM prediction. As a secondary and complementary measurement we also provide the 2D fit of the EW and QCD WV components.

  • Can you clarify how you get the 2D contours? I might be confused by the figure, but comparing the 1D and 2D results, it seems you get the 68% contour from 2*delta(Log)=1, while being a 2 dof, it should be from 2*delta(log)=2.3

Green ledContours are extracted as 2*delta(LNN) =2.3 (68%) and 5.99 (95%).

  • L252, how do you check they are “consistent with each other”, I think you can skip such a statement, but if you want to have it you need to properly account for the (large) correlation of the different fits.

Green ledThis was just a loosely defined comment about the different fit results. We have removed the statement about the consistency of them.

  • Looking to your figure 7, I have some mixed feelings about your fit assuming muQCD=1. What you do is correct in the sense that you fit under some clear assumptions, however since your data prefers a muqcd about one sigma higher you get a confidence interval significantly smaller than what you would expect. In fact, I’m a bit surprised that your expected precision is similar, again looking to the 2D plot, I would expect it to be about twice wider. An alternative fit that prevents all these problems is defining the profiles “MINOS-style” confidence interval.

Blue ledThe analysis has been optimized to extract the EW WV component of the VBS process, not the QCD-WV contributions, which has a small acceptance in our signal region. As the 2D analysis confirms, we do not have a strong handle on the QCD-WV contribution, therefore for the extraction of the EW WV component we think that fixing muQCD=1 is a consistent choice. The theoretical uncertainties on the QCD-WV normalization are already included and quite large (~25%) in our signal regions.

  • Figure 3, left, there are some sizeable discrepancies, especially being a postfit plot. Did you check the gof of the fit? BTW, we recommend not to use horizontal error bars to represent bin size

Green ledThe mjj_VBS distribution is not used directly in the fit but it is one of the DNN inputs. A GOF test has been performed with the recommended saturated model on the full likelihood in the fit, resulting in a p-value of 75%. Therefore the impact of this small discrepancy in the mjj distribution is not strongly impacting the description of the DNN in the signal region, as confirmed by the postfit distributions in the signal region.

  • Figure 5 top-left ratio shows a clear tendency, sort of a slope. Did you analyze it? Where it comes from? Does it affect your result?

Green ledDuring the review we have isolated the effect as correlated to the jet pt observable. The VBS trailing jet pt, used as one of the inputs of the DNN, is badly described by the MC, but the data-driven strategy implemented for the W+jets background estimation is able to correct the trend. In the control region, a 2% discrepancy is still visible, but covered by the JES uncertainties. In the signal region we don't have the power to constrain the JES, and the uncertainty is covering for the residual discrepancy.

  • Figure 6, bottom ones are misleading, since your data and background are not independent, especially on the first bins. It makes it appear biased with such a big overconsistency. I would recommend to properly calculate the error of the difference accounting for the correlation (and combine stat and systematic). It is a very similar case as the one described in the pulls in https://twiki.cern.ch/twiki/bin/view/CMS/DataMCComparison

Blue led We have improved the legend of the plot to clarify that the uncertainty band represents the total uncertainty on the sig+background estimation. That is compared on one side with the number of signal events (without uncertainty since it is already included in the band) and with the (data-bkg) pull. The error on the data points is only the Poissonian error on data, to be compared with the total bkg+sig uncertainty band.

Comments from Wolfgang Adam (IR HEPHY Vienna) (cds)

  • Style guide: Run 2 should not be used in title or abstract; no need to put CMS in the title

Green led We propose to change the title to "Evidence for vector boson scattering in semileptonic decays with lnuqq final states in proton-proton collisions at \sqrt{s}=13\TeV". We will discuss further the title in the final reading, since different CWR comments offer various alternatives.

  • "semi-leptonic lnuqq final state" - the two fwd jets are also part of the final state. Maybe "Search for vector boson scattering with decays to lnu and qq"?

Green led We propose to change the title to "Evidence for vector boson scattering in semileptonic decays with lnuqq final states in proton-proton collisions at \sqrt{s}=13\TeV". We will discuss further the title in the final reading, since different CWR comments offer various alternatives.

  • Abstract, L2: Style guide: Run 2 should not be used in title or abstract

Green led Done

  • Abstract, last line: drop "within uncertainty" - "agreement" should be sufficient and no need to make the abstract even longer

Green led Done

  • L2-14: This paragraph is a bit cumbersome to read because of its convoluted sentence structures. A reader who is new to this topic will find it hard to grasp the core ideas upon first read. Shorter sentences and fewer concepts/ideas per sentence would help in conveying the the motivation for this work more effectively.

Green led Agreed and improved the paragraph.

  • L14: Only in the recent years the dataset collected by the LHC experiments has become --> Only in recent years has the dataset collected by the LHC experiments become

Green led Done

  • L28: "charged and isolated electron or muon": as electrons and muons are always charged, this sounds weird; replace e/mu by "lepton" or reformulate in some other way by the presence of a single isolated electron or muon,

Green led Done

  • L44: Can one say that a generator "calculates" a process? Changed to “simulated”

Green led Done

  • L86: Open a new paragraph for the description of the trigger.

Green led Included a paragraph for detector description and triggers

  • L93/94: in "of (|η| < 2.5 for electrons, |η| < 2.4 for muons)." the parentheses seem unmotivated; also, for clarity, the order of muon/electron (which is quoted first) should be kept the same in both parts of the sentence and also later on.

Green led Improved the text and kept the same order of electron and muons in all the text.

  • L97: ak4 / ak8 : suggest to move to capitals as in other paper (AK4, AK8).

Green led Done

  • L101: Better move "to AK8 jets" to the end of the sentence

Green led Done

  • L102: "These" is ambiguous since the previous sentence only refers to the jet mass.

Green led The text has been taken directly from the official twiki https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/PubDetector “When using a standard top quark tagging algorithm, add the following:“

  • L106-107: More precise would be "The fraction of VBS events in the sample is enhanced". The sentence in L125 is very similar - they could be combined.

Green led Removed the first sentence and improved the second one.

  • L109: "coming from" -> ", associated with" or similar (in data this is just an assumption)

Green led Improved clarity.

  • L122-124: Suggest to rearrange to "In the resolved category, out of the remaining jets, the jet pair ... is then chosen as the decay ..."

Green led Done

  • L129: "top" -> "tt" (try to stick to one naming scheme)

Green led We moved to the notation top → top quark in all the text

  • L133: “in order to have a good reconstructed on-shell W boson in the hadronic final state” -> “in order to keep events with well reconstructed hadronically decaying on-shell W boson:”

Green led Improved for clarity

  • L137: "as W+jets" -> "as part of the W+jets"

Green led Done

  • L143: In this context, phase space is jargon

Green led Improved

* L149/150: "normalization, that" --> "normalization, which"

Green led Done

  • Fig. 6 caption: remove "combining the full Run 2 statistics"

Green led Done

  • L240: This sentence is quite convoluted and long, and thus it is hard to read. Consider rephrasing.

Green led Rephrased

  • L264++: Put all expected values in parentheses

Blue led We prefer to avoid as much as possible the parentheses structure for the observed/expected numbers.

  • L239: Put expected numbers in parentheses (as above)

Blue led We prefer to avoid as much as possible the parentheses structure for the observed/expected numbers.

Comments from Livio Fano' (IR Perugia) (cds)

  • L20 How do you treat the W->tau nu decays? Are W->tauhad considered as W->qq ? Are W->tau nu >lnunu considered as W->lnu (At least for the second question the answer seems to be true)

Green led The W decay into taus is considered part of the signal, and W decays in all leptonic final states are generated. However the analysis has been tuned to look for electron and muon final state

  • L28 As electron/muon is charged by definition, dropping “charged” may be the case

Green led Done

  • L30 Is there a threshold on dijet invariant mass imposed in the event simulation? If yes, which?

Green led It is mqq>100 GeV. This fiducial region is described in the results section when the cross-section measurement is reported.

  • L36 from this phrase, it seems (at least to me) that they’re estimated in a totally data-driven way. However, it’s not like this as far as I understood, they reweight the simulations based on the CRs in different way for each background. Maybe this can be made a little bit more clear. Also, ttbar is nominated at line 40 as a MC simulated background, leading to a little bit more of confusion, whilst W+jets is not

Green led While simulations for these backgrounds exist, in both cases, an approach based on control samples on data is applied in order to improve the description of these backgrounds in the signal region: the W+jets contribution from MC is corrected differentially exploiting the events in the dedicated control region (as described in more details in Sec.\ref{sec:background}), while the top quark background shape is taken from MC but its normalization is measured from data in the dedicated control region.

  • L97-103: It could be reasonable to mention the SofDrop mass algorithm right after introducing AK8 jets, and then to state how W/Z tagging is performed. It could be useful to add a statement about N-subjettiness, not a trivial quantity for “newcomers”, right before too. Consider also to mention PUPPI algorithm together with SoftDrop, as SD, to work at best, is usually applied to PUPPI-ed AK8s.

Green led We have reordered the phrase to improve the clarity as suggested. We included the application of PUppi on AK8 jets. But we don't feel like added much detail on the N-subjettiness definition. We have already included a reference to the definition of those variables.

  • L116: “on the W boson decay topology” may sound appropriately

Green led we prefer "reconstruction regime" since the use of the work decay would add a repetition.

  • L119 since in 121-122 a VBS-based selection for AK4s is mentioned, it should be replaced with “together with corrected at least two...

Green led Agreed

  • L126 This request is applied before or after selecting the VBS tag-jets? If after, have you considered to apply it As written in the test it is applied before?

Green led As written in the text it is applied after the VBS tag jet selection choosing a jet pair from the "remaining jets". This choice has been studied and discussed in detail during the review.

  • L146: pT_miss and MTW thresholds are quite tighter than the one usually imposed (40-50 GeV) to define QCD-enriched region with fake estimation purposes. Is there a particular motivation to prefer these thresholds against the usual ones? Moreover, why do you want to have a very leptonically-isolated jet in this region? More explanations about the used fake estimation method (also only reporting dedicated works made by others) and possibly closure test could be useful to countercheck the estimation itself.

Green led The ptmiss threshold has been chosen in order to be orthogonal to the analysis phase spaces (quite low MET requirement of >30 GeV). The MtW has also been chosen to be quite small to reduce the EWK prompt contribution in this single lepton QCD-enriched phase space. The isolated jet is used in the non-prompt estimation as a "recoil jet" Choosing different thresholds for the recoil jet pT we span different flavour composition of the fakes, hence the fake rates are also computed varying the recoil jet pT in order to build systematic variations. Closure tests have been performed during the review process.

  • L152-154: Any reference reporting this experimental fact?

Green led We don't have one, but it is quite clear from CMS internal checks on this MC. (https://twiki.cern.ch/twiki/bin/viewauth/CMS/MCKnownIssues#WJetsToLNu_HT_and_DYJets_HT_LO_M)

  • L166: Is the “QCD” WV production treated differently in the DNN from the “EW” one?

Green led Yes, it is considered as a background.

  • L176 Are DNN input variables preprocessed?

Green led they are transformed to scale 0 and std =1, but this is a rather standard procedure and we don't feel like it is necessary to add a phrase about it in the text.

  • L190-191 Is the fit performed simultaneously or separately for the two different light lepton flavours?

Green led Electrons and muons are combined in the fit.

  • Tab1 Out of curiosity, the maxima and minima reported are referred to the absolute value of eta or to eta itself?

Green led They are defined with signed eta.

Comments from Prasanna Siddireddy (IR California Santa Barbara) (cds)

  • Has the relation between the ttbar normalization vs number of b jets been checked?

Green led The ttbar normalization (rateparam) in independent of the number of reconstructed b-jets. Uncertainties related to b-jet reconstruction efficiency has been taken into account as well as the theoretical uncertainty related to the "top" definition, single top and double-top production

  • For the W+jet control sample, could the normalization be affected by a background that has a larger contribution in the 65<m_V<105 compared to outside of 65<m_V<105? (For example Drell-Yan?)

Green led All background contributions have been considered in the estimation of the W+jets correction factors.

  • Could there be a background that has a similar DNN shape as the signal?

Green led As shown by Fig.4 all the backgrounds have a quite different DNN shape wrt the signal. The most similar one is the VBF-V EW production in the boosted category, given its similar kinamatics in that region.

  • The title refers to Run 2 while the abstract refers to Run-II. These should be consistent. Or even better, I would use Run 2 in the abstract and not include this term in the title. But a bigger issue is that I don’t think that the title really fits the result of the paper. I suggest “Evidence for the vector boson scattering process in proton-proton collisions at sqrt s = 13 TeV in the semi-leptonic lnuqq final state”.The term “search for” currently in the title is used when no significant signal is observed, and that is not the case here!

Green ledWe propose to change the title to "Evidence for vector boson scattering in semileptonic decyas with lnuqq final states in proton-proton collisions at \sqrt{s}=13\TeV". We will discuss further the title in the final reading, since different CWR comments offer various alternatives.

  • The sentence with the colon...(5.1 expected): the first evidence of ....at LHC. [need to add “the” before LHC] seems awkward. I would end the sentence at the colon and then start another one.
Suggest ...(5.1 expected). This result is the first evidence of vector boson scattering in the semileptonic channel at the LHC. Green led Agreed

  • But I think that it would be helpful to the reader if added the information that VBS has already been observed in the fully leptonic final state. So, suggest ...(5.1 expected). This result is the first evidence of vector boson scattering in the semileptonic final state at the LHC, confirming the result in the fully leptonic final state. I think that this would add helpful context for many readers.

Blue led In principle this is a quite different analysis and channel than the fully-leptonic final state ones. We are not sure that it is correct to state in the paper that this work is confirming the results of the fully-leptonic analysis.

  • Introduction: I think that it would be very helpful to the reader to characterize the results of the measurements in the fully leptonic channel so far. You say that there is renewed interest in the theory community about this process, but the discussion is quite vague. Is there a CMS result that you can state for the fully leptonic final state?

Green led We have included the references to the full-leptonic papers.

  • Line 40: Here, the statement refers to the W boson; however, both W and Z boson processes are mentioned in the brackets. Is the QCD-ZV process really considered? There is no mention of the QCD-ZV process in the rest of the paper. If it is really considered, then the statement can be changed to "non-resonant W and Z boson pair production (QCD-VV)."

Green led The QCD-ZV process is considered in the analysis and part of the VV background in the plots. The phrase has been fixed as suggested

  • Line 41: How do the radiative W and Z boson productions contribute? Does the photon get misidentified as a jet? What is the percentage contribution of these processes to the total background?

Green led The photon is misidentified as a jet or fails the acceptance cuts. The contribution of Vgamma processes is < 1% of the total background in the signal region.

  • Line 48: Is this sentence referring to the top quark pT reweighting and Z pT reweighting? Please clarify. Both the reweighting are included in the analysis.

Green led Gen level top pt reweighting and DY ptll calibration. We feel that we do not need more detail about this in the text. We have included the necessary references.

  • Line 59-60: Question and suggestion. The phrase “, and therefore it is neglected in the final measurement.” suggests that it is taken into account at stages preceding the “final measurement.” But this is probably not what you mean...or is it? My guess is that you should simply say “...of the analysis and is neglected.” By the way, in this construction, you don’t need the comma because the phrase after the conjunction “and” is not an independent clause (the subject “it” has been removed). So this is just as grammatically correct but omits needless words.

Green led Agreed. The interference has been evaluated but never included in the analysis.

  • Line 109: “...coming from the W boson decay…” You don’t know where the lepton comes from, so this phrase is not really appropriate here.
Green led We changed it to “ associated with the W boson leptonic decay“

  • Line 127: The MT upper bound might seem surprisingly high for some readers. It might be worth considering mentioning why the cut is so high.

Green led We do not have a strong reason why this cut is “high”. We impose it to remove a region of the phase space where the signal is completely absent, without beeing too tight on that requirement.

  • Line 133: It is not just the W boson that is reconstructed on-shell. The sentence should read "a good reconstructed on-shell V (W or Z) boson in the hadronic final state". Is that correct?

Green led Agreed, added the Z boson.

  • Line 140: I think that this section would benefit from an introductory sentence laying out the main background categories and giving their rough fractions or importance. Right now, it seems a bit like a random walk and it begins with “minor” backgrounds. Normally, it is best to put the less important information at the end, rather than highlighting it by putting it at the beginning.

Green led The order of the paragraph has been improved. The importance of the top and W+jets background is already mentioned at the beginning of Section2, but we have added a reminder at the beginning of Section4.

  • Line 162: "into two sub-regions, closer or farther with respect to the W/Z resonance." The definition of the two sub-regions is not clear from the sentence. A more detailed description would be beneficial.

Green led Added clarification

  • Line 176: “...matches with the physical intuition.” This is an unusual phrase for a scientific paper. How about “...behaves correctly.” or “...behaves as expected.”This certainly seems like a useful tool, but given that it does not affect the result, other than helping you to avoid bugs, I am not sure what you can really say about it. Or does it also help to avoid some kind of instability or pathological behavior of the NN?

Green led The tool has been used to cross-check the dependence of the DNN model on the input variables and to rank them. We have improved the phrase but we think it is valuable to cite its use. Moreover we have included the ranking of the variables by the SHAP technique in the Tab1.

  • Line 176: The sentence refers to Table 1, but it is very far away in the paper. Suggest that you move it to the following page.

Green led Now they are closer

  • Figure 3: Could you please comment on the excess observed in the region mjj^VBS above 2 TeV?

Green led More data are needed to define the possible excess.

  • Figure 5: The ratio in the upper left plot shows a trend. Is this understood?

Green led During the review we have isolated the effect as correlated to the jet pt observable. The VBS trailing jet pt, used as one of the inputs of the DNN, is badly described by the MC, but the data-driven strategy implemented for the W+jets background estimation is able to correct the trend. In the control region, a 2% discrepancy is still visible, but covered by the JES uncertainties. In the signal region we don't have the power to constrain the JES, and the uncertainty is covering for the residual discrepancy.

  • Figure 5: In the lower left plot, there is a significant excess in the CR for DNN>0.75. Could you please comment on this?

Green led More data are needed to define the possible excess, that is not significant.

  • Table 1: The variables should have been arranged according to their importance.

Blue led We considered the logic ranking of the variables more important, not to distract the reader and ease the reading. We have added in the text a sentence about the most relevant variables and we have included a new column in the table to state the SHAP ranking .

  • Line 204: "varying the lepton momenta within their average uncertainties"Average uncertainties of what? The uncertainty estimation needs a bit more clarification here.

Green led Agreed.

  • Line 218: Regarding the theory uncertainty, it is not completely clear in this paragraph whether you are referring to the impact on the signal efficiency or background determination, or both. Maybe you could say uncertainty in what.

Green led It is in general the uncertainty in the signal and background yield. Improved the text.

Comments from Celso Martinez Rivero (IR Santander) (cds)

  • L4 odd sentence “still at its infancy with respect to…”

Blue led We prefer to keep this sentece. The introduction has been rephrased in some points.

  • It might be worth stating in the Abstract that the signal is EW-produced W(lnu) + W/Z(qq) + qq.

Blue led We find this syntax a bir cumbersome, but we have added the clarification of "plus 2 jets" in the abstract .

  • Reference [12] should go when CMS is mentioned in line 23.

Green led Moved reference

  • L36: you prefer these backgrounds taken from data if simulations exist?

Green led The text has been improved to specify that these backgrounds are estimated from simulation but with data-driven corrections.

  • L40: they said in line 35 that the ttbar background is taken from data, and now they claim it is taken from MC. I guess it's because the shape comes from MC and the normalization from data, but then it could be better explained.

Green led An explanation paragraph has been added.

  • L48: is the recommendation really to apply the top pt reweighting? In case where ttbar MC is used to model the background, for BSM searches, using the official weight might be dangerous according to the official twiki.

Green led The application of top pt reweighting in the phase space of the analysis has been studied in collaboration with the HWW group, which shares the same set datasets for their semileptonic analyses, and has been proven to be helpful in correcting the top Pt MC.

  • L60: why neglect it and not include it as a systematic?

Blue led Include the interference effect as a systematic is not the most correct approach since it affects also the QCD-VV production. Given its really small contribution we prefered to neglect it for this study.

  • L92. Hasn't the HLT bandwidth increased from 2016 to 2018?

Blue led We think that it would be too much detail for this paper: we used the recommended text from the Twiki https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/PubDetector.

  • L93-103 -> The pt and eta requirements for the AK4 jets are explicitly given but nothing is said about the AK8 jets. I would suggest adding this information.

Green led Added the eta requirement and improved in general the section.

  • Lines 104-115 -> Selection criteria on the VBS jets are mentioned here although the criteria to tag a jet as VBS are not given until the next paragraph. I would avoid making statements on objects that were not defined yet.

Green led The order of paragraphs in Section 3 have been improved

  • L108. In line 105 you wrote tag-jets and here you've written tag jets.

Green led Fixed

  • L109: why not put an upper bound on met to remove some additional backgrounds if only a limited amount is expected from the signal?

Blue led We did not consider necessary this cut, since there is not upper limit to the momentum of the neutrino in the selected signal process.

  • L111. It might be necessary to write at least a sentence about the pileup.

Green led Added

  • Eq 1, a parenthesis is missing, maybe the eq would gain readability with some []

Green led Fixed

  • L116-127 -> It's not clear to me how the selection and tagging of the jets is done. In particular the question comes when there are overlapping situations like for example one jet that gives together with another jet the highest mass (so it would be VBS) but with another jet the closest to the W mass (so it would be tag-jet). How are these situations resolved? Please add this information to the text so other people can reproduce the procedure.

Green led As the text says the pair of jets closer to mW/Z “out of the remaining jets” is selected as the Vhad jet pair. Added “out of the remaining jets after the VBS tag jet selection” to be more explicit but it may sound redundant.

  • L132: I find a bit odd that the MV are changed by just 5 and 10 GeV These intervals have been optimized to have the mass peak of the hadronically decaying vector boson centered both in the resolved and boosted categories.

Green led They are different because of the differences between the AK8 jet mass calibration and invariant mass built with 2 AK4 jets.

  • L133. Why not "to have a good reconstructed on-shell W/Z boson"?

Green led Improved

  • L140-141 -> The sentence states that "minor backgrounds" such as DY, VBF-V, VVV, Vgamma processes are estimated from MC. However, in figure 4, particularly for the boosted signal region, some of these backgrounds are absolutely dominant for the bins of greatest sensitivity. In the rest of the text, nothing or very little is said about which crosschecks have been done to validate this MC. Could you please elaborate on how you build the confidence that these MC simulations are correctly describing the data.

Green led The only background that has a non negligible contribution in the high score bin in the boosted category is VBF-V, that however will profit from dedicated measurements being a quite difficult background to isolate. Given this, their estimated contribution is performed by means of the most accurate MC predictions, and all the theoretical and experimental uncertainties have been considered.

  • L151-164 -> What are the purities of W+jets and ttbar in the corresponding W+jets and ttbar control regions? Is it safe to normalize these backgrounds using these regions or there are substantial contaminations from other backgrounds.

Green led In the W+jets CR the W+jets fraction is 55% in the resolved and 60% in the boosted category, and the main second background is nonprompt (30%) which has a large normalization flat uncertainty included in the fit. The top contribution in the W+jets CR is only 3% (5%) in the resolved (boosted) category. Therefore it is safe to use that region to measure the W+jets normalization. In the top CR instead the top quark background fraction is 90%, therefore it is quite pure to measure the top contribution.

  • L155: Explain a bit more "in a differential way"

Green led It is differential in the sense that the W+jets MC sample is corrected splitting it sub-categories using a 2D binning in two observables, as explained in the text.

  • L162-163 -> I am not sure I fully understood the method of dividing the W+jets and ttbar in two regions to compare the corresponding scale factors. The fact that the numbers are the same in the two regions makes the method consistent but not necessarily correct. Could you please elaborate a bit further on this?

Green led Firstly, only the W+jets CR has been split for this closure test of the W+jets correction. No need to perform this test for the ttbar. The rationale of the closure test is to cross-check that the correction factors for W+jets extracted from the region farther from the signal region (the resonance in terms of mVhad), can be applied in the region close to the signal, without looking directly at the signal region. Some text has been added to improve the clarity.

  • Sec 5, do you train against all the backgrounds or only to the main one?

Green led We train signal agains all backgrounds weighted correctly by their relative importance (cross-section*SF)

  • L172-174 -> The SHAP method is mentioned. Even if I found the method and its use very very interesting, no conclusion, statement, or even reference is made to it in the rest of the paper. Therefore I would suggest removing the sentence about its usage, or additionally to add a sentence somewhere explaining what information was extracted from this method, or what were the benefits of using it.

Green led We expanded the sentence about SHAP describing the 3 most important variables as identified by the technique. We have also included a column in the Tab1 with the SHAP ranking for all the variables

  • L175-176 It is very interesting to use this method, but what do you get from it?

Green led We get a cross-check of the model dependence on the input variables and ranking of their importance.

  • Table 1 caption, it would be clearer if you move the definition of the variables to the text

Green led Fixed

  • very little is said about the systematics applied on the backgrounds taken from MC. The variations of the renormalization and factorizations scales are mentioned however I am missing a single number giving some information about the size of the systematic for these MC-based background estimations. Would it be possible to add it?

Green led All the leptons, and jets systematics applied in the fit have been described. They are applied on the background and signal taken from MC. Added more details in the text as suggested.

  • L225 it would be useful if you explain at the beginning what you intend as shape or normalization effect

Green led At the beginning of Section 6 there is already an explanation of shape and normalization effects: “In the signal extraction fit, each uncertainty is represented by a nuisance parameter that morphs the shapes of the distributions for the signal and background processes or scales their total normalization. "

  • Figure 3 and others. It might be necessary to write if the uncertainty in the plots is statistical+systematic, or just statistical.

Green led The uncertainty is the total systematic one. Legend improved.

  • Figure 4 and others. It would help adding inside the plot canvas the category boosted or resolved.

Green led The category resolved/boosted is always clear from the x-title.

  • Figure 5. It cannot be easily extracted from the distributions. What's the fraction of top and W+jets in their respective control regions? What is the signal contamination in such control regions?

Green led Top contribution in top CR is 90%. W+jets contribution in W+jets CR is 55% (60)% in the resolved (boosted) category. The signal yield in the top and W+jets CR is < 0.5% in both.

  • Figure 7 clarity suggestion. Use red for expected and black for measured, including central points.

Green led Done

  • L261. The W+jets data driven estimation is a bit complex. Is it possible to add some plot / number to support the goodness of the estimation?

Blue ledPlots in the W+jets CR have been included in Fig5. Although they are not a direct handle on the W+jets data-driven estimation method since the DNN distribution is not used at all for the W+jets MC correction factor computation. Anyway they show a rather good estimation data/MC ratio in the control region. Additional checks have been done during the review but we do not think that additional space should be dedicated in the paper with details about them.

Comments from Sijin Qian (IR Peking) (cds)

  • L45 and L50: Are you sure the version of MG is same for all three years(2016, 2017, and 2018)?

Green led Confirmed.

  • L57-60: Is it less than 3% of the signal including the effect of the QCD scale and the PDF variations, or only the central value is less than 3%?

Green led The central value is less than 3%.

  • L93-103: There do exist ML hardonic vector boson taggers like DeepAK8, which has been approved. So, maybe for my own education, why ML W/Z taggers are not used in this analysis?

Green led We know about the existence of the DeepAK8 ML tagger but since the analysis has been developed since the beginning using the grommed mass and tau21 kinematics selections we prefered to stick to this simpler selection, having already a good efficiency on our signal process.

  • L108-114: You should introduce how to get pTmiss firstly and then introduce the event selection.

Green led Improved that section

  • L131: Maybe you can explain more for the DeepCSV such as which the style of it (cutBased or ML), and which value you used to define a jet as b jets or not.

Green led We stated that is ML based and explicited that we are using the loose WP and gave a reference. We do not think that expliciting the value of the cut is necessary in the paper. We have added the efficiency/mistag of the working point in the text.

  • L132: the V here should be the hadronically decaying boson, please mention it.

Green led Done

  • L142-148: For the QCD enriched region, if you require at least one lepton, it means that it can exist more than one lepton in your fake region. This will indicate more than one jet is identified as lepton and I don't understand how to evaluate fake rate and furthermore extrapolate the rate to signal region in this multiple jets fake leptons situation.

Green led As in all the analysis, a second lepton veto is applied also for the fakt rate determination. Only 1 lepton loose/tight is considered in order to compute the fake-rate

* L147: why would you require DeltaR > 1 in the QCD enriched region? Is it not too large?

Green led The isolated jet is used in the non-prompt estimation as a "recoil jet" wrt of the lepton. Choosing different thresholds for the recoil jet pT we span different flavour composition of the fakes, hence the fake rates are also computed varying the recoil jet pT in order to build systematic variations.

* L151: In the paragraph, do you mean the normalization of W+jets is also got from W+jets control region? And do you mean that different W+jets components get their own normalizations and included in the final fit?

Green led Yes, correct. The W+jets CR is used to measure the normalization of the different W+jets sub-components. In pratice the W+jets CR is split in a lot of different subregions corresponding to the sub-categories defined for the W+jets MC. All these regions are included in the fit to measure the W+jets categories normalization.

  • L166-178: (a) How do you deal with those backgrounds estimated by data-driven methods in your DNN? The DNN is trained with MC events for W+jets and top quark background, since the MC is anyway used to estimate their shape and the data is used to correct their normalization (in several bins for the W+jets MC). (b) You use data-driven methods for several backgrounds, and it is not clear how or if you consider these backgrounds in the training, since you cannot use the same events twice. Please clarify in the text.

Green led We have improved the text to explain that the MC samples for W+jets and top quark are used for the shape (for the W+jets the shape is used in each sub-categoriy), but their normalization is corrected with data. Therefore the MC events can be used for the training.

  • L191: Is it only the total yield that is used in the W+jets control region? Around line 155 you write about splitting this control region in W pT, and say that "their normalization is left unconstrained and uncorrelated in the final fit".

Blue led No, it is not the total yields. As described in Sec4 the W+jets normalization is estimated in the fit for each sub-category in an independent way. We don’t feel that in L191 there is the need to specify more about the splitting of the W+jets normalization.

  • L194-228: It seems that uncertainties from several data-driven methods are not introduced in this draft.

Blue led The uncertainty in the data-drive estimation of the W+jets MC (and also for the top one) are automatically included in the fit. No additional systematic uncertainty has been added since the data-driven correction for the W+jets MC has been shown to work well with a closure test. The uncertainty from the normalization estimation in the control region is automatically propagated in the signal region along the fit.

  • L204-208: Varying lepton momenta will lead to MET variation as varying jet momenta, which is however not mentioned in the paper draft.

Green led The lepton momenta and jet momenta variation is propagated to MET. It is not mentioned because its effect is very small given the loose MET cut included in the analysis.

  • L217: Do you consider the QCD scales uncertainty for all MC?

Green led As explicited in the rest of the paragraph the QCD scale have been considered for all the MC, and for W+jets and top only the shape effect has been included.

  • L240-L243: Most recent CMS analyses use particle level for fiducial cross sections, since it is much less generator dependent than parton level.

Blue led As discussed during the approval process, defining a particle level cross section for this many-jet final space poses several difficulties. Reporting the gen-level generation cut has been the preferred option in the SMP PAG for this kind of measurement.

Before CWR

Comments on paper v5

Comments from ARC Chair Darien Wood to Paper v5

  • All style and wording comments:

Green led implemented

  • Title: I think the current title could be misinterpreted to say that the search is for vector boson pairs (though they have been found previously. Maybe it is clearer to say "Search for vector boson scattering production of vector boson pairs in the semi-leptonic decay channel at 13 TeV". Also, should be include "in pp collisions" in the title? It would be nice, but the title is already long.

Green led We would change the title in "Search for vector boson scattering at the LHC Run 2 with CMS data in the semi-leptonic lvqq final state".

  • line 96: You mention "PF candidates" without defining them. Can you add a sentence earlier, just after the detector descripton to define particle flow?

Green led Added the standard reference phrase about PF from PubCom

  • line 126-129: The description of the QCD multijet background estimate method is vague. Perhaps you could say more about the orthogonal phase space where the fake probabilities are measured?

Green led Included the cuts defining the QCD enriched region for the non-prompt estimation.

  • Figure 3: Different line types should be used so that the figure can still be deciphered if printed in black and white. Green led Done

  • Figures 4 and 5: I think it would help to reverse the order of the legends, so that the item as the top histogram corresponds to the first legend entry, and the histotgram at the bottom corresponds to the last legend entry.

Green led We feel that the important thing is to sort the samples and keep the same sorting consistent between all the plots in the paper, as it is done. This syle has been also used in the past in HIG PAS-19-017.

  • Figure 5: From the caption, it sounds like all uncertainties are included, but the legend indicates systematic only. "Systematics" in legend is jargon. Use "systematic uncertainties" or use "syst. unc. in legend and spell out in caption.

Green led Improved the legend as suggested.

  • General: I seem some places "signal strength" and "signal strength modifier". Maybe these should be reviewed for consistency. While the mu parameters are themselves "signal strength modifiers" (or maybe "signal strength multipliers" is more explicit), when we fit for mu we can still say that we are measuring signal strength. I think you only need to use "modifier" or "multiplier" when you are specifically talking about a value for mu. I leave to to the authors to decide how they want to handle this.

Green led We changed the text to use only the "signal strenght" phrase and avoiding the "modifier" attribute.

  • It is important to see the DNN distributions because these are the source of the final result. But the paper contains no kinematic distribution, so the reader has nothing to look at to tell them why VBS looks different from the background. The Shapley values tell you that mjj_vbs is the most important variable, so I would suggest including the postfit distributions of this variable for the resolved and boosted cases (fifth plot in Figures 124 and 142 from AN_239_V14). Since the paper is targeted for a letter journal (PLB), you might have space constraints. I suggest that Fig. 2 could be dropped. I like this figure, but all of the essential information about the CR and SR selections is already in the text.

Green led We agree on including the postfit Mjj VBS distributions. We would also prefer to keep the analysis regions graph in the paper to increase the clarity of the event selection: we have reduced the size of the graph and we only reach 12 pages which we think it is not near the limit for PLB.

Comments from CCLE Hannes Jung to Paper v5

Comments in HN: https://hypernews.cern.ch/HyperNews/CMS/get/SMP-20-013/40/1.html

  • Green led All comments implemented

Comments on paper v3

  • [Kenneth] For the MC sample size uncertainty: Can you update the name in the table or split the uncertainty breakdown into MC sample size and nonprompt sample size so this is clear? It's an important distinction for exactly the reason you mention in your conclusions. Green led Unfortunately it is not trivial to compute the impact of the non-prompt alone since the MC stat uncertainty is built by combine summing the contributions of all the samples. We support the idea of calling it "limited sample size" in the uncertainty breakdown table.

  • [Kenneth] Did you try fitting the EW+QCD with two separate signal strengths rather than fixing the ratio? Green led The 2D fit has been included in paper v5. It is compatible with the results of the EW-only fit and EW+QCD joint fit.

  • [Kenneth] Can you add a discussion of the results to the abstract? A similar discussion to the conclusions would be reasonable. In the summary it would be good to put a little more emphasis on the strength of the technique and result to give it a bit more bite Green led

  • [Kenneth] Ln 154: I really prefer to say that the target of analysis is the measurement of a cross section or signal strength which is then quantified by the significance. Green led

  • [Kenneth] Ln 201: Similar it makes more sense to give the signal strength modifier and then quantify its significance (since this is the thing you measure and the significance is a test you make for it) Green led

  • [Kenneth] Maybe you can add a sentence or two about the agreement of the results with the SM, as it stands it ends a bit abruptly before the summary. Green led

  • [Kenneth] Take a look at some of the style guidelines to pre-empt the language editor: https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/PubGuidelines. For example, stat. and syst. should be in roman font and not capitalised. I can't imagine W^{pt}_{lep} is what pubcomm would recommend, it would more likely be something like \pt^{W,\ell}.Green led

  • [Kenneth] What's going on with the table 1 caption for centrality []? I think you should try hard to define variables and then use them everywhere, it's a bit sporadic whether you use a variable that's given a symbol (like centrality here). Green led We are looking for the correct citation to the ATLAS paper which introduced the centrality variable.

  • [Kenneth] Pubcom would also recommend not to have so many vertical bars in the table. And maybe there's a better choice of marker than X to indicate that something is used. Can you make a check mark instead? Green led

Approval presentation comments

Major points before raising the phys-app flag:

  • Green led Share pointers to the data/MC distributions for 2017 and clarify the impacts and constraints related to forward jets uncertainties in 2017
  • Green led Check which is the MC sample(s) that is giving a large contribution to the MC statistical uncertainty, and if the uncertainty could be reduced
  • Green led As a simple test, evaluate the cross section measurement using the EW WVjj samples with dipole recoil off
  • Green led Share the fractions for each of the signal processes (WW, WZ, ZZ, splitting in leptonic and hadronic decay modes).
  • Green led Consider the VBS Z(ll)Z(qq) and Z(ll)W(qq) components as part of the background given its small fraction after the selection (exact values to be provided, see previous item).
  • Green led Make a consistent definition of the EW and EW+QCD fiducial definitions and cross sections. Consider the processes with W(lv)V(qq) as part of the signal, and the ones with Z(ll)V(qq) as backgrounds. The theoretical values should be quoted with uncertainties.

These items have been addressed by the authors and answers can be found in these slides. An updated version of the paper and AN is in preparation.

Other comments and discussion during the approval:

  • Q AndreaR 2017 how did you handle forward jets ? Green led We applied the tight PU 30-50 GeV nanoAODv7,we also normalize the data in each year

  • Q Slide 25. Did you check the compatibility among years? Green led We didn't explicitily. The signal strengths plot shown in the approval display 68% uncertainty bands.

  • Q AndreaR s25 chi2 prob for this ? (biased) Issues with 2017 ? Green led agreement was better in 2018 but still covered by uncertinaties.

  • Q AndreaR Jet uncertainties not visible in the impacts ? Are the different in different years ? Green led Plot on s36 similar trends in each year for the W+Jets data-driven corrections. The impacts for Jet related uncertainty has been added in the approval Q&A slides pag5.

  • Q Guillelmo plots in the AN ? Green led Prefit and postfit plots related to jets eta shown in approval Q&A slides pag3,4,5.

  • Q AndreaR different corrections depending of the type of jets? (pileup jets and real jets) Green led Yes, it's also split by years

  • Q Kenneth s23 MC samples uncertainty : what sample ? Green led The leading uncertainty does not come from VBF-V, as discussed at the approval meeting, but from the non-prompt MC sample (see Q&A slides pag7. The non-prompt weights come from the data-driven correction based on the fakable method, therefore it cannot be improved easily.

  • Q Kenneth impact of signal PS model Green led s40. Expected significance improved. But also the change in showering for the VBF-V sample has an impact on the results. Updated expected results with VBS no-dipole model (but using the correct VBF-V sample) are shown in the Approval Q&A slides pag8

  • Q Guillelmo use the standard pythia signal MC and distribute the signal strength Green led The signal strenght is 1.2 +- 0.3. SM significance 4.9 (expected 4.6).

  • Q KostasT still signal modelling. Herwig PS ? Green led A in backup the herwig trend is not understood. (PietroG) We need a better understanding of herwig usage in this case

  • Q KostasT s23 table is not in the paper yet Green led A will be included after the approval

  • Q KostasT s26 fiducial xsection definition can be done at particle level ? Green led A Paolo usually done like this in other VBF/S measurements: at LO parton level. This one would be very difficult to define at particle level with the additional jets.

  • Q Guillelmo Slide 22. For the expected fiducial cross section, is there an uncertainty? Green led Q The uncertainty (QCDscale and PDFs) for the expected fiducial cross section have been computed and shown in slide 11 of Q&A slides (and in the next version of the paper)

  • Q Guillelmo on the signal : is ZV included ? Q all VBS processes part of the signal also in the reported sigma ? Green led Recomputed results with consistent splitting of VBS components. ZV processes considered as backup. Since the fraction of ZV events in the signal region is <4% of the total VBS signal events the results are changed minimally. Signal strenght is now 0.85 (it was 0.86) and observed sigma is 4.4 (it was 4.5). Full results shown in Q&A slides pag 11,12,13.

  • Q AndreaR still did not see JER or PU impacts on the AN Green led A ok will look and check why it is ranked so down. The impacts for Jet related uncertainty has been added in the approval Q&A slides pag5.

  • Q Darien Dipole recoil just for curiosity to turn it off or we need to revisit the uncertainties ? Green led A Guillelmo no just curiosity

  • Q Darien would particle level decrease the syst uncertainties ? Green ledA not much difference probably

  • Q Kenneth combined EW+QCD separate fit ? Q can you try fitting both ? Green led A We have performed a EW,QCD simultaneous 2D fit: results are shown in slide 15 of the Approval Q&A slides

  • Q Paolo s26. Is the analysis acceptance fully contained in the gen level parton level cuts ? in particular for the VV+QCD bkg ? Green led A. they should be loose enough

  • Q How was the QCD/EW interference considered? Green led A. negligible contributions, so ignored

Paper draft

21/06/2021 Comments from ARC Konstantinos Theofilatos Paper v3

General comments:

  • I would naively expect to see limits on anomalous couplings coming out from this analysis. Yet, this is completely absent from being discussed. Is there a plan for making such an interpretation of the data in the signal region ?

Green led For this analysis we focused on the search for the SM model signal, but it is indeed already planned to proceed in a separate analysis with an aQGC/EFT interpretation, together with the ZV channel.

  • A partonic level cross section is quoted with large uncertainty e.g, 1.9 ± 0.5 pb , with the expected theoretical cross section 2.2pb coming out without any uncertainty. I am not convinced on the usefulness of this, i.e., on defining the fiducial space on parton level and having a cross section measurement with so large uncertainty. In any case, what prohibits quoting a particle level cross section ? Green led In paper v4 we have quoted the partonic cross-section with theoretical uncertainties. As discussed during the approval meeting it would be very difficult to define at particle level XS with the additional jets.

  • abstract: can it be made more interesting ? What is the conclusion of this paper ? We've been unlucky with the observed significance can we say something about what we learned out of doing this analysis ? Green led

  • Tabularized systematic uncertainties, by reading the paper I had no direct feeling of what are the most important systematic uncertainties on the quoted signal strength. Can we make a table listing the major sources of uncertainties ordered by their importance on the measured signal strength, e.g., for Eq.3 the breakdown of +0.23 - 0.16 ?

Green led A table summarizing the uncertainty breakdown in the main components have been added to paper v4

  • Signal modeling, could you please summarize (or point me to the summary) on what is the conclusion regarding usage of the dipole recoil scheme on Pythia and the signal modeling with Herwig ? Do I understand correctly that Herwig is only used for the background modeling and not for providing an alternative signal model ?

Green led The final signal modeling used for the paper is Madgraph + Pythia8 with dipole recoil scheme. A summary and comparison of the different modelling can be found in ANv14 AppendixD. An Herwig generation have been attempted but the GEN level distribution showed an unexpected trend: the issue has been discussed in a GEN meeting slides and no bugs were found in the configuration. The experts response is that the semileptonic channel needs more investigation for the Herwig simulation. For the electroweak single V production background simulation (called VBF-V in the analysis), the Herwig showering has been used to improve the dipole recoil showering modelling: VBF-Z from central samples, VBF-W from private production.

Other comments by line of appearance:

  • Fig 2: the diagram shows both W/Z leptonic decays, while the analysis is for W->lv. Please fix the diagram such as to reflect the signal sought Green led

  • L3: Interactions -> interactions, or even consider erasing "of fundamental Interactions" Green led

  • L25: one of the two top quarks decays hadronically -> I guess we better precise that this refers to the decay of the W that originates fro m the top Green led

  • L39: spell out what QCD VV includes. V=W or Z or also photons ? Green led

  • L39: erase extra comma after gluon induced Green led

  • L40: what is tt+tW? Are you talking about two process top-pair production and single top in the tW channel or really tt+tW produced out of the same hard scatter ? Avoid the "+" between tt + tW, if this is not what is meant to be said Green led

  • L44: expand on what are the VBF-V processes implied Green led

  • L92: explain how overlap is treated, if both an Ak8 jet is found and 2 Ak4 jets are reconstructed, is priority given to the resolved or to the fat-jet for classifying the event? Green led

  • L95: discuss if there is any bias due to the preference of the dijet closest to 85 GeV. Are non-resonant backgrounds, e.g., W+jets or QCD, featuring peaking shape as result of this choice in the region near the W/Z masses ?

Green led The non-resonant backgrounds feature a peak in the Mjj Vhad spectrum given by the selection, but the bias is reduced selecting the VBS jets before the jets coming from the V boson decaying hadronically. cSigVsBkg_res_sig_mu_mjj_vjet_res.png

  • L113: expand on what is VBF-V, VVV and Vγ Green led Expanded the description in the MC samples section

  • L127: it's fine to leave the normalization unconstraint, are the extracted normalizations follow some kind of smooth ~monotonic dependence as function of the pt ?

Green led The W+jets normalization factors follow a quitew smooth trend along Wlep Pt and VBS trailing jet pt. The trend is shown in Section 9.2 of ANv14 (W+jets post-fit normalizations summary) Figures 87,88

  • L133-136: I don't what it's done, please improve the text Green led

  • Table: Azymuthal -> Azimuthal Green led

  • L173: Is this analysis sensitive to the ECAL prefiring. I guess this has been studied, please point me to the twiki/an-note, but I wonder if a line should be added in the paper summarizing the size of this infamous effect in SMP-20-013 analysis
Green led The effect of ECAL prefiring in 2016 and 2017 has been included as an additional nuisance following the standard recipe. The impact on the signal strenght is < 1%.

  • Fig5: we are not obliged to include pre-fit, but I wonder why not ? Green led We prefert to show only the post-fit distribution for brevity.

  • Fig5: y-label, explain what is the Expected in the Data/Expected. Is expected the expected S+B Green led

  • Fig 6: Instead of background subtraction plot, have you tried a background-only template fit on the data i.e., fixing the signal cross section to 0 and letting all background components free to float/morph as in the nominal analysis fit on data ? Orange led

16/06/2021 Comments from CCLE Hannes Jung to Paper v3

Comments in HN: https://hypernews.cern.ch/HyperNews/CMS/get/SMP-20-013/32/2.html

  • Green led All comments implemented

16/06/2021 Comments from ARC Chair Darien Wood to Paper v3

  • Ak8 and Ak4 are not defined, and I think you need a sentence or two identifying and describing the jet algorithm, especially since you are using both resolved and boosted jets. Green led

  • Line 28: jets. "Two of these jets are required to have large invariant mass". It would be more clear to say "One pair of jets is required to have a large invariant mass" since you are talking about the mass of the pair and not the individual jet masses. Green led

  • Line 152: "A reasonable agreement is shown". The Pubcomm discourages applying adjectives to describe quality of agreement. I think you can simply say that "The predictions and the data agree with in uncertainties in both cases, ..." Green led

  • The "All MC" label is still appearing in the legends of Figures 4 and 5. This should be removed or replaced with something more meaningful. Green led

  • Since you report a fiducial cross section, it would be nice to reiterate the definition of the fiducial region in the Summary. But perhaps you are very tight on space. Green led

  • A colon is used in places to separate independent clauses. This should be replaced with a full stop and start of a new sentence, or possibly with a semicolon. (line 6, line 182). Green led

  • The present tense is used fairly consistently but there are a few places where past tense should be replaced with present. (lines 36, 123, 139, 140, 147, 206). Green led

22/02/2021 Comments from ARC to Paper v2 link , by Darien Wood

General points:

  • The paper lacks any discussion of systematic uncertainties. Do you plan to add these? If so, where? Green led

  • I presume "Results" and "Summary" will be updated and expanded after unblinding. Green led

  • The "All MC" label that appears in Figures 3 and 4 needs to be relabled or removed. I do not see this at all in Fig. 3 and in Fig. 4 this seems to refer to the uncertainty that appears in the ratio plot. Green led

  • In the figures with stacked histograms (4,5, 6), it is very helpful if the order of the legend matches that order of the stacking (e.g. on Fig 4, start with data, then top, W_jets, non-prompt, VBF_V+Vgamma, DY, VV+VVV, VBS) Blue led In the legend the lines are filled in top-bottom-left-right order, from the sample at the bottom of the stack to the top one.

Specific suggestions:

  • Title: It might be useful to specify that the production is of pair of vector bosons, e.g. "Search for pairs of vector bosons produced by vector boson scattering in the semileptonic.." Green led

  • Abstract: "The data sample corresponds to the full Run-II CMS dataset of proton-proton collisions at 13 TeV corresponding to 137 fb-1". To avoid the repetition of "correspond", this could be reworded. "The search uses the full Run-II CMS dataset..." Green led

  • Abstract: "Events are selected requiring one lepton (electron or muon), two jets with large pseudorapidity separation and dijet mass, separated in two categories: either the hadronically decaying W/Z boson is reconstructed as one large-radius jet, or it is identified as a pair of jets with dijet mass close to W/Z mass". Shouldn't you also mention the requirement on missing transverse momentum? I suggested spitting this into two sentences: "Events are selected requiring one lepton (electron or muon), moderate missing transverse momentum, two jets with large pseudorapidity separation and dijet mass, and an additional jet system consistent with the hadronic decay of a W/Z boson. Events separated in two categories: either the hadronically decaying W/Z boson is reconstructed as one large-radius jet, or it is identified as a pair of jets with dijet mass close to W/Z mass". Green led

  • Line 3: "Standard Model (SM) of Fundamental Interactions" -> "standard model (SM) of fundamental interactions" Green led

  • Line 7: "its unitarity is granted" -> "the violation of its unitarity is avoided" Green led

  • Line 9: "other" -> "other diagrams" Green led

  • Line 17: "It is therefore of mandatory importance" sounds awkward. Maybe "It is therefore compelling"? Green led

  • Figure 1: Strictly speaking, the q and q' labels are not consistent. The q and q' in the W+ decay do not need to be the same as the q and q' in the incoming and outgoing quark lines. You could use q'' and q''' in the decay, or, since it is an example, you could replace the in coming q's with us and the outgoing (q')'s with d's, and leave the q and q' in the W decay. Green led

  • Line 31. "will give rise" -> "gives rise" Line 31-32: "it will disintegrate into two jets" -> "it is resolved into two jets" Green led

  • Line 35 "heavy quarks" -> "top quarks" (If I understand correctly, you are only referring top quarks here, and not b's and c's.) Green led

  • Line 36: "on both cases" -> "in both cases" Green led

  • Line 41-43: I interpret this to say that Wgamma is generated with MadGraph and Zgamma is generated with POWHEG. Is this correct? If so, why are different generator used for these similar processes? Blue led Correct, those were the availble samples at the time of the finalization of the analysis..

  • Line 53: "This analysis target" -> "The target of this analysis" Green led

  • Line 53: "signal observation significance" -> "signal significance" Green led

  • Lin3 57: "models, that" -> "models that" Green led

  • Lin3 59-60: "The signal significance extraction is performed in sub-regions of the phase space, where the signal-over-background ratio is more favourable." Are you just talkiing about the signal regions here, or something else? Why are these "sub-regions" and not just "regions." I woudl not hesitate to use the terms "signal region" and "control region" since these appear in the figure and elsewhere in the text. Green led

  • Lines 61-63: suggest moving "(usually called tag jets)" to directly follow "two jets originating from incoming partons" Green led

  • Line 64-65: "smoking gun" is jargon, and probably an overstatement in this instance. Maybe "which are a strong indication of top quark decays". Also, his is part of a very long sentence that is difficult to parse. You could consider breaking into several sentences. Green led

  • Line 67: "First of all," -> "First," Green led

  • Line 68: "them all" -> "them" Green led

  • Line 69: "The leptons are reconstructed requiring to have" -> "The reconstructed leptons are required to have" Green led

  • Line 71: "jets are considered if p_T is greater than" => "The jets are considered if they have a p_T greater than" Green led

  • Line 75-76: "If the previous condition is not fulfilled while at least four AK4 jets are found the event is classified..." -> "If the previous condition is not fulfilled and at least four AK4 jets are found, the event is" Green led

  • Line 79: add parentheses around "the average between m_W and m_Z" Green led

  • Line 80: "within" -> "between" Green led

  • Line 79-83: I could not follow this long sentence all of the way through. Somewhere around "namely" I got lost. Please consider restructuring this. Green led

  • Line 83: "rquired" -> "required" Green led

  • Line 85: "m_V far from the W/Z resonance range..." Do you really mean "far from" or just "not within"? If I understand correctly, there is no gap in m_V between the signal region and the control region. Green led

  • Line 89: I don't think "phase spaces" is the best term to use here, because this usually refers to kinematic properties and not to lepton flavor. I think it would be more clear to say "Finally, all of the signal and control regions are split according to.." Green led

  • Line 91: "sophisticated" -> "complex" (sophistication usually implies some sort of willful complexity) Green led

  • Line 97, add comma after "Fig. 3", "distribution" -> "distributions" Green led

  • Figure 3: It is good to vary the line type as well as a color to distinguish the different histogram. The label on the ordinae is missing. Also, why are they normalized to 0.08 instead of to unity? Blue led Distributions are normalized to unity. We can vary the linestyle for the signal to distringuish it from background. It will be included in the next version of the paper.

  • Line 107-112: Can this sentence be simplified? e.g. "The QCD multijet background, which may enter the signal region with non-prompt leptons, is estimated in a fully data-driven way by measuring in [XXX] the probability for a loosely defined reconstructed lepton ..." Note the placement of commas and the substitution of "which" for "that". The [XXX] says "in dedicated phase space" but I do not think it will be clear to the reader what this means. Green led

  • Line 113 "top enriched phase space" -> "top enriched control region" ? If it is the top CR that you are referring to, it is more precise to use this name that is already defined. Green led

Unblinding

Unblinding procedure agreed with ARC and conveners:

  • 0 - unblind DNN inputs
  • 1 - unblind GoF
  • 2 - unblind nuisances
  • 3 - unblind DNN output and fit results

Results step 0:

Comments:

The shape of the lepton W pT, the variables used to correct the W+jets categories normalization from the control regions, is corrected in the fit by the data driven procedure and shows a good data/MC also in the signal region. The GoF result shows compatibility with the data. Out of 29 variables, the variables with worst agreement are Δη VBS and trailing VBS jet pt mainly in the resolved category. The Δη VBS observable is not correlated a lot with the DNN distribution (see the slides Δη interpretation). The authors proposed to slightly change the data-driven strategy adding subcategories in trailing VBS jet pt in order to correct also the data/MC trend for this observable. Agreed with the ARC and conveners.

New unblinding results:

ARC questions about the unblinding

  • [Mariarosaria D'Alfonso]: Are these postfit VBS-jets discrepancies (Deta and jetpt1) more pronounced in one particular year ? we know that the 2016 is different than your 2017-2018 to start with.

Green led The discrepancies in deltaeta and jetpt are similar for all the years. Ref. Pag 13 and 18

  • [Mariarosaria D'Alfonso]: Also how those two variables performed in the closure region (the near mass window as described in Section 7.3 of ANv9)

Green led The discrepancy is similar in both the regions, so we are confident to be able to correct the discrepancy in the signal region by correcting the W+jets subcategory normalization in the control region.

  • [Mariarosaria D'Alfonso]: Something more general: The datasets are split into 80% training and 20% for validation. The 100% signal/MC bkg is used in the final fit. Is this correct ? Usually the training samples are orthogonal to the one used for the validation and also final signal extraction.

Green led As in many other analyses, once the agreement between the training and testing samples is achieved, the full sample is used. In addition special DNN training options such as dropout, and batch normalization have been used to keep under control the overtraining.

Preapproval talk comments slides

  • [Darien] s27 SF 6 W+jets categories, why using WpT to bin and scale ? Green led Was known not to agree with data. Independent variable still correlated with the W+jets kinematics of interest. Leptonic observable less sensitive to QCD effects, rather than jet-related variables. Corrects for normalisation.

  • [Guillelmo] s8 do you apply any special selection for electrons ? Green led The related issues were fixed with proper fake rate estimates, no special selections needed

  • [Guillelmo] s8 how do you exclude Ak4 jets pointing to ak8 ? Green led Ak4 collection cleaned with DR>0.8 requirement wrt ak8

  • [Guillelmo] s11 did you study the eff of correct pairing ? Green led shown in s57-58 : measured the matching eff with partons. Determined in a fiducial region where 4 partons are assigned to 4 jets.

  • [Paolo] what is the eff of fiducial well matched 4 jets Green led Checked offline: 42%

  • [Guillelmo] s15 Do you have the data/MC comparison for these distributions? (CRs and SRs?) Green led A yes many in backup

  • [Guillelmo] How is the number of jets data/MC ? Green led not in backup but not bad in CRs, all in the AN

  • [Guillelmo] s18 What do these plots mean? Green led Each event is given an S/B weight to indicate if they are more signal-like or background-like

  • [Guillelmo] s27 Better to show DNN distributions weighting by the bin widthA Green led We will do it.

  • [Guillelmo] s27 How did you chose the variable to correct the MC? Green led We use a variable correlated to the W pt, we prefer

  • [Guillelmo] s28 Uncorrelate muon and electron nonprompt rate uncertainty Green led OK

  • [Guillelmo] QCD scale only shape variations ? Green led Yes

  • [Guillelmo] s39 why are not either prefit or postfit =1 ? Green led on 38 prefit is the W+jets SFs (as from s23)

  • [Guillelmo] start with prefit SFs or without and see if you converge to the same values Green led OK

  • [Darien] s10 Did you consider the control region that has bjets and is off-shell? Could be useful for single top, for example Green led Plots in the top off-shell control region added in ANv9

  • [Philip] s18-19 did not really understand the plots how does high detajj get bkg like at high values ? Green led it is true and interesting

  • [Philip] is the y-axis ordered in some way ? Green led A ordered in decreasing importance

  • [Philip] QGL is important ? Green led A quite important: adding QGL improved exp significance from 4.2 to 4.6

  • [Darien] inv mass of diboson systems ? Green led next step/publication with BSM studies

  • [Paolo] s33 prefit vs postfit expected significance follows opposite trends for resolved and boosted categories, why ? Green led It depends on the different postfit background normalizations, not clear

  • [Paolo] s39 Why are SFs so different in 2016 w.r.t. 2017/8? Green led A data/MC distributions are significantly different, visible on s53

  • [Paolo] As your leading uncertainty is signal QCD, would it be possible to produce a NLO signal ? Green led A Difficult. Maybe could try to evaluate some NLO effect.

  • [Paolo] Can you show a table with the data, signal, and background yields? combining all channels? maybe most significant bins? Green led
Added in ANv9

  • [Paolo] What was the procedure to decide the DNN input variables? Green led A. Started from a larger number of them, checking the importance, and reducing the number of variables without any degradation.

  • [Yacine] How do you check the overtraining in the DNN? Green led A Checked a kolmogorov test between test and training samples.

  • [Guillelmo] Besides quoting the signal strength, you should quote a cross section. Either a looser or a tight selection, this is something you may Green led yes. We will quote the cross section with the generator level cuts.

  • [Paolo] s41 how will you treat interference contributions in the fit ? Green led
The interference contribution in the signal region is less than 4% of the signal yields. Therefore it has been negleted in the final fit.

  • [Paolo] s5 with MJJ>100 in the inclusive signal definition you might have triboson processes contributing. Is there any relevant contribution from tribosons after the event selection ? Green led if on shell VVV is 0.1% events then VVV offshell is <<< 0.1%

Analysis Note

13/01/2021 Changelog for analysis AN v7 link

Collected answers to the questions before the freeze for preapproval in the ANv7. Synchronized with content shown in the preapproval talk.

  • Added W+jets categories closure-test factors comparison plot.
  • Added W+jets post-fit and pre-fit scale factors comparison.
  • Added preliminary EWK+QCD combined signal strength measurement.
  • Removed Pythia to Herwig factor from signal uncertainties. An alternative parton shower for signal will be evaluated once the sample will be ready.
  • Splitted QGL morphing uncertainties components.

Comments and discussion on ANv6 link

  • Have you considered how you can make a combined EW+QCD cross section measurement? This has always been well-received by theorists and shouldn't be a complicated addition (you can use a simplified version of the fit, with both the EW and QCD components treated as signal)

Green led The EWK+QCD measurement resulted in an expect total signal strength of $1^{+0.230}_{-0.199}$. The result and likelihood scan will be added to ANv7.

  • Also a related question, does the PythiaToHerwig line in Fig 74 and 75 have the flattening using the nominal DNN distribution from Pythia? I think it would be great if you could show at least one DNN distribution before the flattening procedure is applied

Green led Yes, the Fig 74 and 75 have the flattening included. The transformation can be seen just as a binning choice. We can show a DNN distribution with a regular binning.

  • Do you have an explanation on why the PythiaToHerwig uncertainty is one-sided?

Green led At the moment we don't have an Herwig signal sample ready to use, but it has been requested. In order to approximately model the difference between the two parton showers we have applied a reweighting based on the number of jets observable referring to a gen-level study performed for the same-sign fully leptonic channel by A. Ballestrero et al., ( “Precise predictions for same-sign w-boson scattering at the lhc” (http://dx.doi.org/10.1140/epjc/s10052-018-6136-y)). 1) it's a proxy, since it was only WW SS in the paper. 2) It is one sided since it is an alternative sample, as was formerly done in CMS with pythia vs herwig, and where the nuisance has been settled as onesided (discussions dates back run 1). 3) we have requested the HW official cmssw sample

  • Similar question, do you know why the QGL-morph uncertainty is over-constrained?
Green led Only 1 parameters per year has been used in ANv6 for the QGL morphing uncertainty. A better approach is to uncorrelate the different components used in the morphing: gluon and quark, low eta and high eta functions. Therefore the QGL uncertainty has been splitted in 4 components, correlated for 2017-2018 and uncorrelated for 2016 and the over-constraint is reduced. Only the uncertainty on the gluon loweta morphing is still constrained because the initial uncertainty inserted in the fit is quite conservative, and moreover the effect of this uncertainty is strongly correlated with other shape nuisances.

  • Follow-up to question "...Can you show in the MC that the shape of the W pt is expected to be about the same...". You're right that the assumption is only that the corrections are the same, not that the shape is the same necessarily. And thanks for pointing me to sec 7.3, I guess I didn't fully understand this before, this is definitely a useful check. Could you plot just the corrections vs. each other so the trends are easier to see?

Green led The prefit W+jets categories normalization SF for the far-from-signal and near-signal regions are plotted for resolved and boosted categories for all the years. The plots, including the total uncertainty band can be found here and they will be added to ANv7. .

  • I still think that if the shapes are about the same in the signal region, that gives some confidence that the corrections should be the same. Can you just plot the W pt distribution in the signal and in the background region together + their ratio? It would be interesting to see in any case.

Green led The plots are available here . The WlepPt distribution for W+jets background in the signal and W+jets control region is compared, and it is found to be compatible. Please remember that the data-driven technique does NOT rely on the two shapes to be compatible though, since the simulation is used to translate scale-factors determined in the control region to the signal region. The reliability of the test is confirmed by a closure test built splitting the contron into two parts and verifying that corrections calculated in one of the two work well in the other. The two regions are defined as far-from-signal, near-signal.

  • Follow-up to question: "There is always the possibility of kinks when you bin a continuous distribution like this...". The point is that you derive corrections to relatively large bins, e.g., the right plots in Fig. 56.. If you apply them to the left plot in Fig.56, do you see any visible kinks in the distributions? Could there be any issues if s
o?

Green led The effect of the rescaling of the W+jets categories on the full WlepPT distribution is shown in Fig. 64 pag. 86 for all years and categories. The first line shows the distribution before the correction, the next line with the rescaling. No large kinks are observed.

  • Follow-up to question "It's true that the statistical procedures rely on knowledge of the signal shape...". I mostly agree with you that LO is the best solution we have, and it's not valuable to do a big scan of predictions at LO unless we're sure that the generators are all configured properly. The one point I disagree on is the statement about how the signal uncertainties are accounted for in the fit. Signal uncertainties are special in that they play next to no role in the distribution of the test statistic used to measure the significance. You can confirm this by running your fit with/wiithout any theory uncertainties on the signal. To that extent, uncertainties in the signal modeling may be better treated as alternative models. I'd like to see a test of the robustness of the signal strength and significance using a different model of the signal as the expectation and the pseudo data. You could do this using your Herwig alternate sample, for example.

Led-gree Done in the final version of the analysis

03/01/2021 Changelog for analysis AN v6 link

  • Splitted parton shower uncertainty in FSR and ISR components
  • Normalized theory uncertainties on top and W+jets samples in order to disentangle the effect of the rateParameters. More details in Sec.8.5.50
  • Added W+jets categories closure-test also for 2016 and 2017. Fig 63
  • Added correlation matrices of DNN input variables. Fig.43
  • Added likelihood scan with breakdown of uncertainties. Fig.80
  • Added W+jets categories post-norm normalizations plot. Sec.9.5
  • Added comparison between QGL morphing and reweighting. Fig.103

Comments and discussion on ANv5 link

  • Can you contact the object contacts of electron, muon, and JetMET to review your usage of objects in the analysis? You can find the contact
info here (please put us in cc): https://twiki.cern.ch/twiki/bin/view/CMS/TWikiSMP

Green led Done

  • Can you also upload your combine cards to a public area and notify Pietro Vischia (with us in cc) to take a look?

Green led The repository for the datacards has been created: https://gitlab.cern.ch/cms-hcg/cadi/SMP-20-013. The latest version of them with the fit checks has been uploaded and P. Vischia has been notified.

  • What is the logic behind the binning of the DNN distribution? (paper Fig. 4, and AN Fig. 46 and 71) You talk about shaping it to a flat distribution in the AN line 266. Is this procedure applied in the boosted case? if so why is the signal distribution not flat? Did you do the rebinning after the flattening?

Green led The DNN output is transformed to make the signal distribution flat (separately for each year) in both the categories. In the boosted category, given the low number of events in the tail of the DNN distribution, we rebin after the flattening.

  • We think the switch to fitting the leptonic W pt is a good one, since it's a more physical correction than the jet binning before, and it's nice to use an independent variable for the correction. Still, some validation of this procedure is needed. Can you show in the MC that the shape of the W pt is expected to be about the same in your signal and control regions? Do you apply any additional uncertainty for this extrapolation?

Green led The simultaneous fitting of the signal region and of the Wjets bins makes it such that the Combine automatically calculates the appropriate corrections in the signal region, depending on the Wjets cross-section in the Wjets bins. This means that the W pT shape is not required to be the same in the signal region and in the control regions, because what is relevant is that the correction calculated by the fit is the same. This is verified by the inner/outer closure test (section 7.3 of the Analysis note), where a fraction of the control region is assumed to be a signal one for the sake of the test, while another subset is used as control region. We will add the trends of the correction factors in the two regions to the new version of the AN. Since the signal region is blind, clearly the correction factors there will be accessible only after unblinding.

  • How is the binning of the w-lepton-pt chosen?

Green led Since the trend in the data/MC discrepancy for WlepPt is quite smooth, a small number of regularly spaces bins has been chosen. The binning in the resolved and boosted category is different to account for the different distribution of the observable.

  • There is always the possibility of kinks when you bin a continuous distribution like this. Do you see them when binning more finely? Did you consider an analytic function for the correction, or forcing some smoothing across bins?

Green led Since the bins of the analysis as rather large, we do not expect to suffer from any kinks. As a matter of fact, the corrections are effective as they are (the data/MC ratio shows a good agreement) and smaller bins would suffer from low statistics in the high energy tails.

  • You had some really nice plots of the corrections across channels in AN-19-239_v4_appendix that are no more. Can you remake those for the new approach?

Green led Added in v6

  • It's really hard to follow the change to the nonprompt background estimate, because the previous version of the note had very little documentation, and you don't really refer to what changed here. Can you add a description of the changes (I think it was based on the trigger you use in the fake rate derivation region)? Also add plots from the regions where you derive the face rates and where you apply them.

Green led We have derived the fake rates with exactly the same trigger as in analysis with the method described in AN-2010/261. The trigger that was used before was not suitable for the lepton WP we are using. The fake rate has been applied in a closure region orthogonal to the analysis and plots are shown in Sec.5.3.1 of ANv5 fig40 and 41.

  • Can you add a few more plots of the scan of the likelihood with individual uncertainties frozen? It's kind of hard to tell what are the leading uncertainties with all the rate parameters in the impacts. Similarly, is here is a way to show the impacts without the rate parameters that still has physical meaning? It's hard to rectify the fact that they are always leading in the impacts with the likelihood scan that shows they are not so important.

Green led More likelihood scan with different groups of uncertainties frozen have been added in ANv6. The uncertainty has been breakdown in normalization, theory, experimental and stat. Moreover the final overall impact plot has been added in ANv6 splitting the rateParameters impacts from the others to improve the readibility.

  • Did you make any comparison of the central scale factors and your approach for the QGL distribution? Ideally this would also be signed off by JetMET

Green led We have compared the morphing with the official correction for the signal sample in 2016. Results added in the ANv6. The QGL morphing method has been presented to JetMet talk. Comments: 1. Jetmet is ok with our morphing procedure, and welcome the possibility of having it available for other analyses. 2. Clearly the morphing needs to be calculated for each parton shower program in a consistent way, which is what is done in the analysis. 3. Jetmet sees as a plus the fact that the morphing has been done for all the years, whereas the official corrections exist only for 2016.

  • Is the QGL included as a new variable in the DNN? Sorry, I don't find this clearly stated anywhere

Green led Yes they are included. The detailed list of the variables used as inputs for the DNNs can be found in Table.11 Pag 59 of ANv5.

  • Fig 85 and 86: How are you defining a quark and gluon jet? This is a really nasty definition itself, and surely there should be some uncertainty associated with the predicted fractions and distributions from Pythia and the specific tunes. Do you show the results for the two tunes separately at least?

Green led The parton flavour associated to the Jet is taken from the official recipe using the Jet_partonFlavour branch in the NanoAOD. Effects of the parton shower uncertainties are considered on all observables as well as QGL variations. More plots including 2016, for the different tunes, added in ANv6.

  • I guess you want to submit this to PRL? It would be good to give your justification to Boaz ASAP so we can confirm whether or not this will be supported by pubcomm. My preference would be to have the longer PLB format so we can describe the analysis more clearly.

Blue led The authors do not have a strong preference towards PRL or PLB. Given the promising expected result for the first observation in this channel, a short paper on PRL can be interesting, but also a longer description of the analysis in PLB format is perfectly fine. Given the longer format, some more time would be needed for the typesetting of the paper draft, but we don't think that this can be a delaying factor for the preapproval of the analysis.

  • It's true that the statistical procedures rely on knowledge of the signal shape. But, from a physics perspective, it's entirely reasonable to ask just how sensitive you are to the knowledge of the signal shape. This can be done by, for example, testing the signal strength and its significance with one model acting as the signal prediction and another acting as the data. For example, the ATLAS Z VBF paper does this with three models: https://arxiv.org/pdf/2006.15458.pdf. In CMS, we've done this with multiple generators in the past, or in the WZ+WW VBS analysis, comparing the impact of applying or not the EW corrections to the signal. I think such a study in this analysis would be very valuable, given that the shape of the signal prediction is quite leveraged by the DNN. One can very reasonably ask whether your observed significance of WV VBS production would be robust if the true WV VBS process is not fully described by the WV VBS MadGraph LO simulation. I suggest thinking of ways to test this. A few ideas: Calculate WZ or WW at NLO QCD with POWHEG, and apply these corrections to your signal Apply the WW or WZ EW corrections in mjj to your signal Try another parton shower and apply the (gen-level) corrections to your signal Note that we see such questions from ATLAS reviewers very often. Doing this study now will make your work more convincing

Green led Any effects that modify the signal shape and are clearly identified as sources of uncertainty in the data analysis are accounted for as a shape systematic uncertainty in the fit performed by combine, which we think gives solidity to the result of the analysis. The precision with which the modelling of the signal is known theoretically is of course a source of concern in general. In this particular case, the generation used is the most precise existing: NLO EW and QCD corrections have never been calculated for the semi-leptonic final state, and according to theoreticians (we have contacted our theoretician collegue M. Pellen after your request to ask for advice) their achievement would deserve a paper on its own. Therefore, any approximate exercises using corrections calculated with fully-leptonic final states would not add to the precision of the analysis, because there is no way to assess how realistic such an approximation would be. Choosing so use an alternative LO generator, like Sherpa, exposes us to two risks: the first one is that in CMS there is not enough experience in the use of such a generator, which may lead to an incorrect use of it; the second one, even more relevant, is that the ATLAS Collaboration clearly showed that, for the VBS case in particular, it is the generator which is further away from data. In more general terms, the comparison studies performed in final states like the EW production of Z + 2jets happen in a case where the generation is known much better and the statistical uncertainty of the analysis is far smaller (the 5 sigma is largely achieved with the data available in 2016), that are two conditions not met in this case.

  • For the fit with rate parameters, you should not have the same set of uncertainties as without, because any uncertainties that are purely normalization should cancel out. It's not really clear at the moment if this is the case.
Ans: Since the convolution of a flat prior with a gaussian prior is a flat prior, effects that change normalization on top of a flat prior (a.k.a. rateparam) has no effect. This effect, that is a pure didactical test, has been studied already in run I with what it was called lnU prior and at the beginning of run II in WW analysis. I think I follow that the normalization uncertainties on backgrounds will not add additional uncertainties wrt the rate parameters. But does this mean that I can't really look at the impact plots and tell how much of the uncertainties impact the result? It would be really helpful to provide an explanation that I can follow on a physics level here that helps me understand the results.

Green led The overall normalization effect of QCD scale and parton shower uncertainties has been removed from samples in which rateParameters are included in order to disentangle the normalization effect. The normalization has been implemented in order to have the same event yields for nominal and varied shapes summing all the channels in which the sample rateParameter is included.

08/12/2020 Changelog for analysis AN v5 link

  • New non-prompt estimation and fixed trigger efficiency : corrects for discrepancy in electron MC. Closure test for non-prompt estimation added in the AN (Sec.5.3).
  • Brand new binning for data-driven W+jets estimation using leptonic W pt. (Sec.7.2)
  • Closure test for the data-driven estimation (Sec.7.3) and added plots showing the effect of the W+jets subcategories rescaling on prefit distributions (Sec.7.4).
  • Updated fit results and impact plots (Sec.9)

General questions and discussion on V4 of the AN

  • Relevant to AN v4: Follow up to previous discussion: Have you considered any study of dividing the W+jets samples based on gen-level jet pt and eta rather than reco level?

Green led No significant differences have been observed when splitting the W+jets sample into RECO-based or GEN-based categories

More details: GenJet collection have been analyzed to extract the highest mjj GenJet pair. DeltaEta of the pair and Pt of the second selected GenJet have been used to split the W+jets sample using the same categorization of the data-driven method on reco objects. We have observed very little difference with the respect of categorization done on reco level jets, as show in the second plot where the bins corresponds to reco-level categories.

imga4f8f0c6fbe244d173d608c1e4a532a0.png
Deltaeta VBS jets in resolved ele W+jets CR

img501c2080cba91116a2d437fa6c39ffbd.png
Deltaeta VBS/ trailing VBS pt bins for data-drive estimation in resolved ele W+jets CR

Follow-up by Kenneth: Have you performed the full expected signal+CR fit with this setup? I'm not very surprised that the categorization doesn't change dramatically, but the behaviour of the fit should change. For one, In the RECO binning most uncertainties that affect the normalization cancel out. For the gen binning, you should still have RECO uncertainties and only the GEN uncertainties cancel out. I would also think that a more granular binning of the etajj/ptj distribution you fit makes sense in the GEN case, so the fit has a bit more tension and the uncertainties are more meaningful. If you then look at the post-fit results, you should be able to separately look at the pulls for the RECO uncertainties and the shift in the GEN distributions, which at least to some extent tell you the extent to which the discrepancies are from modeling or from detector/reco issues.

Green led The question is outdated by the new binning chosen for the analysis. Since the binning at gen level has no effect for our analysis, given the wjets being a background that has a data-driven component, we stay with the reco level binning. In the new implementation of the analysis, see AN v5, the binning is performed in bins of reconstructed W transverse momentum, binning motivated both by experimental evidence of not good description of this variable by the current accuracy of MC generators, and by theory calculations shown in gen meetings.

  • For the fit with rate parameters, you should not have the same set of uncertainties as without, because any uncertainties that are purely normalization should cancel out. It’s not really clear at the moment if this is the case.

Green led Since the convolution of a flat prior with a gaussian prior is a flat prior, effects that change normalization on top of a flat prior (a.k.a. rateparam) has no effect. This effect, that is a pure didactical test, has been studied already in run I with what it was called lnU prior and at the beginning of run II in WW analysis.

  • If you are including the normalization component of uncertainties on the data-corrected backgrounds, why?

Green led Until now the normalization component of the uncertainties on the data-corrected background has been included. Although it is more tidy to keep the shape and normalization separate, it has been shown (and it is mathematically equivalent) that keeping shape and normalization effects is the same. This topic was first discussed in Run 1, where we talked about lnU, since rateparam were not yet been implemented, i the context of the H>WW search and top/WW control regions.

  • Fig: 8. In general you have good consistency across bins and across channels, which is good to see. But there is a consistent offset in electron vs. muon scale factor, where the electron scale factor is always ~10-20% larger. This makes me think that you’re missing something in the electron efficiency estimation. Could it be the trigger efficiency, for example? If the corrections for the electron and muon channels can be unified, this would give more confidence that the data/MC corrections are related to the jet modeling and reconstruction only

Green led Data/MC disagreements in the electrons case have been understood and it has been fixed by means of a more proper calculation of that fake rate. In AN v5 the discrepancy is not there.

  • Appendix A.5: I don’t find this study exceptionally convincing. If you just move the mu for the expected signal, the fit can find perfect agreement by just pulling the mu, and I would never expect the background to get pulled. I think you need to make a study that introduces some shape variations, perhaps drawing toys per bin from background+signal+unc. Then compare the normalization per bin to the normalization from the toy and see if there is any bias that is evident over many toys.

Blue led We do not understand the rationale behind the request. As in any other CMS analysis, the likelihood ratio is built on the knowledge of the signal shape and the bias tests are meant to show that the analysis is able to distinguish the background from the signal. An artificial signal built as a mixture of signal and background will b

y construction show a bias in the result, since it is constructed to partially behave like the background (a simple RooFit exercise can show it). The background variations within the post-fit uncertainties, on the other hand, are accounted for by Combine.

03/06/2020 Comments from Conveners on v4 of the AN-19-239

*Link to AN appendix with details about fit tests and answers to questions*

  • Table 2: for single electron, in principle you could have used HLT_Ele32_WPTight_L1DoubleEG (with just a precaution concerning the L1 filter) and lower the offline pT threshold to 35 GeV (maybe it wasn't possible in the Latinos framework?) Do you know how much acceptance (and sensitivity) you are giving up with an offline cut of 38 GeV? Looking at final distributions (e.g. Fig. 46), the signal yield in the 2017 electron channel seems significantly lower than in 2016. By the way, also for single muon in 2017 you could use HLT_IsoMu24 OR HLT_IsoMu27 (HLT_IsoMu24 was only pre-scaled for a short time, ~3/fb). Clearly this is pointless if you cut at 30 GeV offline. But have you considered lowering the muon pT threshold to 27 GeV for all years? Would it help?

Green led The detailed answer can be found in the appendix B of the AN (see link above). A test has been performed looking at the 2017 MC without trigger efficiencies. Lowering the electron Pt threshold from 38 to 35 GeV would increase the signal yield by ~ 5 %, but also it would increase the W+jets background of 5%. Lowering the muon Pt of 3 GeV (the test has been done from 33 GeV to 30 GeV since the Pt > 30 GeV was preselection of ntuple skim) would increase the signal by 8 % but the W+jets would increase of 10%. In our opinion , given the fact that including the prescaled trigger for muon or the electron trigger with 2 L1 seeds would require additional technical details as well as lepton id/Iso scale factor re-derivation, the increase in signal efficiency with the respect of W+jets background does not justify the need of changing the thresholds.

  • Table 4: the leptonic decays include taus, right? (It seems so from the cross sections)
Green led Yes, tau decays are included.

  • Table 5: the 2016 DYToLL-M50 sample seems to be missing (I only see the DYToTauTau one). Is it just an oversight, or is there a reason for not including it?
Green led Ths samples list is now updated in AN v5.

  • L 242: do you know how much signal efficiency you lose by removing this 2.5-3.0 region? Could you recover it with a tighter PU jet ID (only in |eta|=2.5-3.0, also for pT>50 GeV)?
Green led L242 needs to be updated because already in the current plots (v4) tight PU jet id has been used to exclude events with tagged jets falling in that eta region instead of complete removal. Around 7% of the signal in the resolved category, and 5% in the boosted category is removed using the tight Jet PUID in the horns regions. This cut was been done a-posteriori but it now it is included as a preselection for the jet tagging in the v5 of the AN to gain back some efficiency.

  • Figs. 6-21: what uncertainties are included in the error band (~20% everywhere)?
Green led All the nuisances included in the analysis until now (listed in chapter 8) are included in these plots.

  • Figs. 6-21: is the nonprompt background from data in these plots? Is it clear why it is so much smaller in 2016 than in 2017-18 in the electron channel?
Green led The non-prompt background is estimated in a data-driven way using the Fakable object technique. The non-prompt estimate has been updated and recomputed correctly in the v5 of the AN

  • Sec. 7.3.1: do you use the non-closures in this test to evaluate some additional uncertainties on the W+jets background? The overall normalization in each subregion seems very good after the fit. But the jet pT distributions still show some larger discrepancies, which can have an impact on DNN. Can you add the DNN distributions in the W+jets CRs after the correction? (You have them in some presentations.)

Green led Updated postfit plots of the DNN in the control region for the AN v5 wil be available ASAP

  • Figs. 40-44: I guess these will be updated with the new binning of L 534-539. Also, please add plots of the DNN spectrum before and after the correction.

Green led OK - superseeded by V5

  • L 621: what is the initial value of these uncertainties, before the fit?
Green led The effect of the QCD scales on W+jets samples is ~ 20% in both signal and control regions.

  • Fig. 45: clarify in the caption that these are DNN distributions (x axes have no titles and the labels are very small to read)

Green led OK

  • L 687-688: fix the references

Green led OK

  • Table 17: do you know what drives the lower sensitivity in 2017? The main differences I see are (1) the higher electron pT thresholds, (2) the exclusion of jets with |eta|=2.5-3.0, and (3) the larger nonprompt background compared to 2016 (also true in 2018). Points (1) and (2) could be improved in principle (see comments above).

Green led The higher pt thresholds for lepton should account for 5% signal yield loss, the tighter JetPUID also for 7% signal loss. The non-prompt contribution should be correct (the 2016 needs to be updated). These factors should explain the different performance.

  • Fig. 49 (impact plot): should we ignore this, and just look at Fig. 50?

Green led OK

  • Fig. 50: all the parameters related to the W+jets normalization have similar impacts. Maybe it would be clearer to have a version of this impact plot where all these parameters are grouped into a single systematic category, to see the cumulative effect of the W+jets normalization procedure. This detailed version could be left here or moved to an appendix.
Green led Likelihood profile plots with and w/o these systematics have been produced. It will be added also for V5

  • Figs. 51-53: very nice! Just a couple of comments: - as I said before, it would be useful to add the post-fit DNN distributions in the W+jets CR, either here or (better) in Sec. 7.3.1, to check the W+jets agreement also in the high DNN region; - what uncertainties are included here? In L 717 you say they are from the fit. But why are they so much smaller than the ~20% you show in the pre-fit plots? Are they really constrained this much?

Green led All the pre-fit shapes also in control region have been added in AN v5. We will provide post-fit plots for dedicated distributions in the next presentation

28/04/2020, K. Long private email, on this version of the AN

Green led it will be done.

  • For example Higgs Boson → Higgs boson, Standard Model → standard model, Vector Boson Scatter → vector boson scattering, CMS collaboration → CMS Collaboration
Green led done.

  • There are also a handful of typos
Green led fixed

  • If you’re going to have the “Section X describes” text you should probably describe all the sections
Green led part removed

  • Ln 80: This isn’t a complete sentence and I’m not sure what it’s meant to say
Green led fixed

  • Why are you using NanoAOD v5 for 2016 and 2017 but v6 for 2018? I assume the intention is to move fully to v6 (and possibly v7 when it’s available)?
Green led Yes, now v7 in the latest AN

  • Ln 103: The table ref is broken
Green led fixed

  • Table 4: Does “in production” mean you are using private MC for the time being?
Green led For the time being we are using 2018 signal

  • Ln 110: I don’t know whether v51 is a typo or a reference to your internal post processed samples
Green led fixed

  • Ln 115: Add the pileup profile
Green led sentence removed

  • Ln 149: I don’t find the details in section 5
Green led It will be described briefly, but the correction is a standard one dating back 2015

  • Ln 190: Booted → Boosted
Green led fixed

  • Ln 202: So you would rather have a pair of jets with mjj = 85 GeV than with mjj = 90 GeV or 80 GeV? Why not just collect (mjj-mZ, mjj - mW) for each pair of jets in the event? I expect the ambiguity is very small but the latter definition is more logical
Green led The impact in the analysis is negligible. We studied in the past and we decided to go with the simplest solution algorithmically

  • Ln 199, and more: try to be consistent with notation. V should be in roman. GeV should always be in Roman (e.g., Ln 206, 207)
Green led Fixed. These questions will be consistently addressed when writing the paper draft for publication

  • Ln 201, 204: tag jets, b jets
Green led done

  • Ln 211: In principle the normalization of everything is derived by the fit. If you mean that the normalization is unconstrained in the fit, please say that
Green led done

  • Ln 216: betweeh → between
Green led done

  • Ln 230: I’m not really sure what you mean by “instabilities”
Green led change to "effects"

  • Section 4.2: Please provide a minimal definition of your lepton selections rather than pointing 100% to another AN. It seems the muon ID is basically just the POG tight CB ID, but the electron ID is quite a bit different. Can you also justify why you go with this ID vs. a POG ID?
Green led We can add few lines, but, as done in previous documentations, it is more useful for the reader just to refer to something already published/used, so that we all don't spend time in checking already checked things

  • Can you also make a table summarizing all the cuts in your signal regions?

Green led OK

  • Ln 276-277: I think mV (rather than M_W) would be more clear here, unless I’m missing something)
Green led agreed

  • Ln 290, 305: Again I’m confused. Do you really mean W? Can’t it be a W or Z, so mV?
Green led agreed

  • Ln 294: Reference the figures explicitly rather than “in the next pages”
Green led fixed

  • Ln 296-297: This is not really convincing. How can a 15% contribution from W+jets be the source of up to 40% discrepancies in lepton eta/pt and ptj, for example?
Green led We have improved (next iteration of the AN) the purity of the top control region. The discrepancies decrease, as expected.

  • In many cases, the data/MC agreement is really not good, and specific comments on these distributions should be made. I spent some time going through the plots in the VBF H mumu analysis (AN-19-205) and the Higgs invisible VBF analysis (AN-19-243) to try to understand if the disagreement you see is consistent with their results. The VBF Hmm analysis seems to generally look a bit better, they use the NLO DY sample for the Z+2j background. The VBF Hinv uses the same LO samples in their control regions, and sees similar for the mjj and etajj, but with more agressive MET cuts.
Green led the total number of jets entering the analysis in this case is larger with respect to those final stated. The comparison between analyses is not expected to show consistency. A data-driven technique is implemented to cure these discrepancies, be them in the LO or NLO simulation. The level of agreement in the control regions, on the other hand, is similar (beware of the range in the ratio plots and the number of bins shown).

  • More investigation is needed, also with the other analysis studying this process in the SMP-VV group. I understand you are still waiting on some samples to finish to use NLO W+jets, but you could at least make some test plots for some years using the jet-binned samples at NLO.
Green led The NLO will be used for the samples as soon as the processing is over.

  • Ln 324: Again, you can leave the details to the reference, but at least some words on the technique are needed. I would also suggest using “Nonprompt” in the plots rather than “Fake”
Green led Done in the text. The plots will be modified when updated

  • Ln 327: Comprehends → comprises
Green led done

  • Section 6: What framework do you use for the DNN? TMVA, Keras? It’s useful to say.
Green led Added

  • How have you decided on your network architecture? How long does training take and on what machine?
Green led We have observed that in general large, well regularized, models perform better than a shallow network with fewer parameters. Therefore we have started with a medium-sized model with around 4 levels and 50 nodes and tried to increase its dimension. In case of performance increase, we have proceeded in making the model larger, in case of overtraining the model dimension has been decreased. A procedure for automatic optimization of the network hyperparameters is under development.

  • Ln 389-392: What is the fractional split used to form the test/train datasets?
Green led It is 80% training and 20% validation.

  • Ln 410 and 411: < and > signs have some issues
Green led Fixed

  • Fig 20: The caption has some spurious text.
Green led Fixed

  • Fig 20: Can you describe more clearly in the caption why it is a clear overtrain?
Green led The loss in the validation is stable and almost increasing, while the loss in the training goes down. The network is learning too much about the training sample

  • Table 11: I don’t find the definition of quite a few of these variables
Green led done

  • Ln 505: Is the mW here computed with lepton+MET? It’s really mass and not transverse mass?
Green led Mass, from jet(s), "W_had"

  • Ln 509: Don’t you fit the rate parameters for the W simultaneously with the signal region?
Green led Not in this closure and validation test, otherwise we would be unblinded. This is a closure test of the method

  • Fig. 27 (Fig 29): Are the right (top right) plots the distribution you actually fit? Why not use more reco bins than GEN?
Green led Yes. They are reco bins. The more bins the more degrees of freedom you give to the system. The proposed number of bins makes sure to correct for the discrepancies observed in control regions (the data-driven method for Wjets, LO or NLO) without spoiling the analysis performance

  • Section 8: Please show the distributions you use in the fit for some illustrative processes (signal, major background) for the leading backgrounds, especially for the JES and theory uncertainties

Green led OK

  • Ln 611-616: If you don’t have final implementations, you should at least add a dummy uncertainty that is likely to capture the effect
Green led OK. The pu reweight is expected to be small, since the effects on leptons is negligible as already taken into account by the scale factors, and the effect on jets is included in the nuisances

  • Ln 636: “Initial fit on data” What does this mean? Isn’t there one simultaneous fit?
Green led There are two kind of asimov toys that can be performed: the "data asimov" (a.k.a. post-fit asimov) and the "MC asimov" (a.k.a. pre-fit asimov). The difference is that the nuisances could be prefitted and their best estimated value be used in the asimov toy, or we can just take the nominal nuisances values. The difference between the two approaches is small if the analysis is full of lnN priors and it has very few and with negligible effects rateparam (flat prior). The difference could be visible if the major nuisances are rateparam, that are supposed to be fitted in the final fit. Using the "data" asimov" is then more reasonable, being more close to the description of the background distributions we expect. In Run 1 we used to normalize by hand some of the backgrounds to take into account this effect. In run 2, having all these new tools available in combine, it is not needed and we can perform this scaling, in an even more correct way, on top of the final datacards.

  • Section 9.2: Where is the distribution of the DNN score over the full 0-1 range?
Green led It will be added in the next iteration of the AN.

  • Please show more information from the fit, especially impacts of the uncertainties and the postfit corrections (rate params for W normalization)
Green led All impact plots will be provided in the next iteration of the AN

-- Davide Valsecchi - 2020-07-20-- Davide Valsecchi - 2020-07-20

Topic attachments
I Attachment History Action Size Date Who Comment
Microsoft Word filedoc PLB_Review-SMP-20-013_response_v12.doc r1 manage 889.0 K 2022-05-02 - 19:13 DavideValsecchi  
Microsoft Word filedoc PLB_Review-SMP-20-013_response_v12am_dl.doc r1 manage 888.0 K 2022-05-03 - 12:07 DavideValsecchi  
Microsoft Word filedoc PLB_Review_SMP-20-013.doc r1 manage 787.5 K 2022-03-18 - 10:44 DavideValsecchi PLB Review answers
Microsoft Word filedoc PLB_Review_SMP-20-013_DWcomments.doc r1 manage 778.5 K 2022-03-20 - 23:11 DarienWOOD referee reply draft with ArC chair comments
Microsoft Word filedoc PLB_Review_SMP-20-013_authors_v2.doc r1 manage 791.0 K 2022-03-21 - 10:44 DavideValsecchi  
Microsoft Word filedoc PLB_Review_SMP-20-013_authors_v4.doc r1 manage 792.0 K 2022-03-23 - 18:06 DavideValsecchi  
Microsoft Word filedoc PLB_Review_SMP-20-013_authors_v6.doc r1 manage 792.0 K 2022-03-24 - 10:09 DavideValsecchi  
Microsoft Word filedoc PLB_Review_SMP-20-013_response_v11.doc r1 manage 883.5 K 2022-04-28 - 07:23 DavideValsecchi  
Unknown file formatdocx PLB_Review_SMP-20-013_secondRound_authors_v3.docx r1 manage 53.5 K 2022-08-15 - 11:02 DavideValsecchi  
Microsoft Word filedoc PLB_Review_SMP-20-013_secondRound_v1.doc r1 manage 54.0 K 2022-08-13 - 11:10 DavideValsecchi  
Microsoft Word filedoc PLB_Review_SMP-20-013_v7.doc r1 manage 779.0 K 2022-03-30 - 15:57 DavideValsecchi  
Microsoft Word filedoc PLB_Review_SMP-20-013_v8_3rd_reviewer.doc r1 manage 881.0 K 2022-04-23 - 17:15 DavideValsecchi  
Microsoft Word filedoc PLB_Review_SMP-20-013_v9.doc r1 manage 888.5 K 2022-04-27 - 12:31 DavideValsecchi  
Unknown file formatdocx SMP-20-013_response2_v2.docx r1 manage 43.7 K 2022-08-15 - 10:22 DavideValsecchi  
Unknown file formatdocx SMP-20-013_response2_v6.docx r1 manage 53.5 K 2022-08-18 - 12:28 DavideValsecchi  
Unknown file formatdocx SMP-20-013_response2_v7.docx r1 manage 53.5 K 2022-08-18 - 18:31 DavideValsecchi  
Unknown file formatdocx SMP-20-013_response2_v8.docx r1 manage 53.6 K 2022-08-29 - 09:50 DavideValsecchi  
Edit | Attach | Watch | Print version | History: r89 < r88 < r87 < r86 < r85 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r89 - 2022-08-29 - DavideValsecchi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback