Skip to content
Merged
Show file tree
Hide file tree
Changes from 88 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
ff6d9a0
Various scripts for validation
bryates Aug 15, 2025
5baa4ce
Fix lint issues
bryates Aug 15, 2025
07171f3
More lint
bryates Aug 15, 2025
78f454a
Missed a few
bryates Aug 15, 2025
762ce56
Missed white space
bryates Aug 15, 2025
744fa0b
A few more
bryates Aug 15, 2025
26ff6ae
White space and unused modules
bryates Aug 15, 2025
9209333
Should be the last one
bryates Aug 15, 2025
3c49497
Added more docstrings
bryates Aug 15, 2025
616207f
More details
bryates Aug 15, 2025
8ba744b
Fix typo
bryates Aug 15, 2025
8c5b513
Some improvements and docstring
bryates Aug 15, 2025
a184589
Added code to normalize to same lumi
bryates Aug 15, 2025
a967468
Option to ignore lumi
bryates Aug 15, 2025
ca88b80
More docstring
bryates Aug 15, 2025
f8a2f23
Fix lint
bryates Aug 15, 2025
d71ec42
Draw third normalization if using a fixed point sample
bryates Aug 15, 2025
40e3709
Update README
bryates Aug 15, 2025
3976079
Added notes to README
bryates Aug 15, 2025
d236808
NOTE
bryates Aug 15, 2025
5c34766
note -> memo
bryates Aug 15, 2025
60140c7
Fix some outdated stuff
bryates Aug 15, 2025
ee8fd26
Fix printout
bryates Aug 15, 2025
4ebd7b3
Minor fixes
bryates Aug 19, 2025
92ec2a9
Setting for fixed point
bryates Aug 21, 2025
a2746d1
Avoid running processors directly
bryates Aug 21, 2025
54d3668
Match things like `2023BPix` and `UL16APV`
bryates Aug 21, 2025
55fa4d2
More robust options (scale by SM with `--scale` rather than always 10%)
bryates Aug 26, 2025
2dac74d
Add scale to name
bryates Aug 26, 2025
8d0bb97
Update make json script and add new, "faster" one
bryates Aug 26, 2025
db39755
Fix lint error
bryates Aug 26, 2025
15ee24a
Missing import
bryates Aug 26, 2025
331c722
Add `--central` option to compare 2 central samples
bryates Oct 9, 2025
c66efc4
`--small` divides by 10
bryates Oct 9, 2025
614d0e8
More instructions, add `--abs` flag
bryates Oct 9, 2025
d0627f8
Add inv mass of system (tops + heavy bosons)
bryates Oct 9, 2025
f6a56fe
Change invm range, just H boson for now
bryates Oct 9, 2025
fdc9d61
Update tX
bryates Oct 9, 2025
72b38e3
Added ptZ
bryates Oct 10, 2025
d1ba793
Fix `--abs` flag
bryates Oct 14, 2025
ae0e2d0
Update invm, add Higgs pT
bryates Oct 14, 2025
6974158
Reorganize `norm`
bryates Oct 15, 2025
9ba83cd
Use all tops for tX (so ttt is correct)
bryates Oct 15, 2025
7303c9f
Use `ak.firsts`
bryates Oct 15, 2025
a731752
Fix flag logic
bryates Oct 15, 2025
f9f7390
Fix year bug, add `--info`
bryates Nov 11, 2025
2a27705
More robust `--no-lumi` option
bryates Nov 13, 2025
bae1e78
Fix lint duplicate `invm` error
bryates Nov 13, 2025
7efa43e
Various fixes
bryates Nov 24, 2025
81f805e
Add `--zero`, fix help strings
bryates Nov 24, 2025
161434f
Smaller t_pt range, dynamic lhe based on pdgId
bryates Nov 24, 2025
b3d8b8e
Use `np.argmax`
bryates Nov 24, 2025
02a1f6b
Back to `ak.max`
bryates Nov 24, 2025
4cadf7e
Fix logic
bryates Nov 25, 2025
5b656f8
Fix zero flag bug
bryates Nov 25, 2025
6c49e61
Less dra bins, no lhe_t for now
bryates Dec 1, 2025
ad2aee9
Fixed lhe_t
bryates Dec 2, 2025
84ef9ad
Fix StPt None typo, re-enabled lhe_t_pt
bryates Dec 2, 2025
26d209f
Fix another typo
bryates Dec 2, 2025
6bde68f
Actually fix this time
bryates Dec 2, 2025
45aea6f
Adding DJR and EFT weights scripts
bryates Dec 3, 2025
a2915dd
Update central 2022 ttHnobb xsec
bryates Dec 3, 2025
53c023b
Fix lint errors
bryates Dec 3, 2025
e42b6cc
Add docstring
bryates Dec 3, 2025
3d6a48f
Add link to starting points json
bryates Dec 4, 2025
d609129
Adding gridpack validation scripts
bryates Dec 4, 2025
4cefc89
Fix typo
bryates Dec 4, 2025
8536d29
Explain `InputCards`
bryates Dec 4, 2025
3ad3c5c
Fix lint
bryates Dec 4, 2025
1954597
More lint
bryates Dec 4, 2025
fbfd36f
Lint again
bryates Dec 4, 2025
24a7e24
Fix ttlnu hist name
bryates Dec 4, 2025
b1d4eab
Inv mass, change fin2 to dash-dot
bryates Dec 5, 2025
e89c3ff
Different invm plots
bryates Dec 5, 2025
6a3644c
Plot all invm hists
bryates Dec 5, 2025
465b83e
More invm_4t bins
bryates Dec 5, 2025
e1f1e70
Increase invm_ttX bins
bryates Dec 5, 2025
2830ab3
Update plotting style
bryates Dec 8, 2025
7ee762e
Dynamic sqrt(s)
bryates Dec 8, 2025
d15e740
Update lumi decimal places
bryates Dec 8, 2025
bb064e1
Update EFT label
bryates Dec 8, 2025
6c5684c
Adjusting which tops are saved
bryates Dec 9, 2025
14b6a5f
Different ttX invm bins
bryates Dec 9, 2025
a062205
Change label when using `--small`
bryates Dec 9, 2025
76aaea5
More ttX bins
bryates Dec 9, 2025
796ead4
Smaller tX bins
bryates Dec 10, 2025
be5f08e
Move shebang to top
bryates Dec 11, 2025
b4095be
Move shebang
bryates Dec 11, 2025
c4fbb03
Fix typo
bryates Dec 11, 2025
b8c1a0d
Code cleanup
bryates Dec 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions analysis/mc_validation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,17 @@ This directory contains scripts from the validation studies of the FullR2 privat
- Should be run on the output of the topeft processor
- Was used during the June 2022 MC validation studies (for TOP-22-006 pre approval checks)

:memo: The scripts below were updated in August of 2025
* `gen_processor.py`:
- This is an updated script to produce gen level histograms for comparison of various samples

* `gen_hist_eventweights_processor.py`:
- This script produces distributions of MG weights (if they are saved in the samples)

* `gen_hist_eventweights_plotter.py`:
- This script produces plots of MG weights
- Should be run on the output of the gen_hist_eventweights_processor.py processor

* `comp_norm.py`:
- This script plots two pkl files for different variables for comparison of shapes and normalizations
- Should be run on the output of the gen_processor.py processor
454 changes: 454 additions & 0 deletions analysis/mc_validation/comp_norm.py

Large diffs are not rendered by default.

115 changes: 115 additions & 0 deletions analysis/mc_validation/djr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#!/usr/bin/env python3
'''
This script produces the DJR plots from nanoGEN files
(assuming the DJR were also saved)
Example Run
python djr.py \
--input /cms/cephfs/data/store/user/byates/tttt/nanoGEN_Run3/2022/tttt_LO_EFT/crab_tttt_nanoGEN_Run3/250715_223705/0000/ \
--output /users/byates2/afs/www/EFT/tttt_Run3/weights/weights.pdf
'''

import uproot
import os
import hist
import awkward as ak
import numpy as np
np.seterr(invalid='ignore')
import matplotlib.pyplot as plt
import mplhep as hep
import warnings
plt.style.use(hep.style.CMS)

if __name__ == '__main__':
'''
good example:
root://cmseos.fnal.gov//store/user/dspitzba/EFT/qcut30.root

bad example:
root://cmseos.fnal.gov//store/user/cmsdas/2023/short_exercises/Generators/wjets_2j/w2jets_qcut10.root
'''

import argparse

argParser = argparse.ArgumentParser(description = "Argument parser")
argParser.add_argument('--input', action='store', default='root://cmseos.fnal.gov//store/user/dspitzba/EFT/qcut80.root', help="Input file")
argParser.add_argument('--output', action='store', default='./djr.pdf', help="Output file")
argParser.add_argument('--nevents', action='store', default=50e3, help="Number of events generated")
args = argParser.parse_args()

n_events_in = int(args.nevents)

djr_axis = hist.axis.Regular(40, -0.5, 3.5, name="djr", label=r"$\Delta JR$")
parton_axis = hist.axis.Integer(0, 4, name="n", label="Number of partons")
#parton_axis = hist.axis.Regular(5, 0, 4, name="n", label="Number of partons")
transition_axis = hist.axis.Integer(0, 6, name="t", label="DJR X->Y")
djr = hist.Hist(djr_axis, parton_axis, transition_axis)

files = [f for f in os.listdir(args.input) if '.root' in f and 'COPYING' not in f]
#n_events_in *= len(files)
tot = 0
for fin in files:
#print(f"Loading input file {args.input}/{fin}")
fin = uproot.open(args.input+fin)
events = fin["Events"]
#ar = events["GenEventInfoProduct_generator__GEN./GenEventInfoProduct_generator__GEN.obj"].arrays()
#djr_values = ar['GenEventInfoProduct_generator__GEN.obj']['DJRValues_']
#nMEPartons = ar['GenEventInfoProduct_generator__GEN.obj']['nMEPartons_']
djr_values10 = ak.Array(events['LHEWeight_DJR10'].array())
djr_values21 = ak.Array(events['LHEWeight_DJR21'].array())
djr_values32 = ak.Array(events['LHEWeight_DJR32'].array())
#djr_values = ak.concatenate([events['LHEWeight_DJR10'].array(), events['LHEWeight_DJR21'].array(), events['LHEWeight_DJR32'].array()])
#nMEPartons = events['LHEWeight_nMEPartons'].array()
nMEPartons = events['Generator_nMEPartons'].array()

djr.fill(
djr = np.log10(ak.flatten(djr_values10, axis=0)),
n = ak.values_astype(ak.flatten(ak.ones_like(djr_values10)*nMEPartons, axis=0), np.int32),
#n = ak.flatten(ak.ones_like(djr_values10)*nMEPartons, axis=0),
t = ak.flatten(ak.local_index(djr_values10), axis=0),
)
djr.fill(
djr = np.log10(ak.flatten(djr_values21, axis=0)),
n = ak.values_astype(ak.flatten(ak.ones_like(djr_values10)*nMEPartons, axis=0), np.int32),
#n = ak.flatten(ak.ones_like(djr_values21)*nMEPartons, axis=0),
t = ak.flatten(ak.local_index(djr_values21), axis=0),
)
djr.fill(
djr = np.log10(ak.flatten(djr_values32, axis=0)),
n = ak.values_astype(ak.flatten(ak.ones_like(djr_values10)*nMEPartons, axis=0), np.int32),
#n = ak.flatten(ak.ones_like(djr_values32)*nMEPartons, axis=0),
t = ak.flatten(ak.local_index(djr_values32), axis=0),
)
n_events = ak.num(djr_values10, axis=0)
print(f"Efficiency is {n_events/n_events_in}, assuming {n_events_in} where simulated")

print("Plotting...")
with warnings.catch_warnings():
warnings.simplefilter('ignore')
fig, axs = plt.subplots(3,2, figsize=(15,21))

for i in range(3):
for j in range(2):
transition = 2*i+j
djr[:, :, transition].plot1d(
overlay='n',
ax=axs[i][j],
label= [f'{k} partons' for k in range(4)]
)
djr[:, :, transition][{'n':sum}].plot1d(
ax=axs[i][j],
label = ['total'],
color = 'gray',
)

axs[i][j].set_xlabel(r'$DJR\ %s \to %s$'%(transition, transition+1))
axs[i][j].set_yscale('log')
axs[i][j].set_ylim(0.3,n_events*1000)
axs[i][j].legend(
loc='upper right',
bbox_to_anchor=(0.03, 0.88, 0.90, .11),
mode="expand",
ncol=2,
)

fig.savefig(args.output)
print(f"Figure saved in {args.output}")
135 changes: 135 additions & 0 deletions analysis/mc_validation/eft_weights.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
'''
This script produces the distributions of weights from nanoGEN files
(assuming the MG weights were also saved)
Example Run
python eft_weights.py \
--input /cms/cephfs/data/store/user/byates/tttt/nanoGEN_Run3/2022/tttt_LO_EFT/crab_tttt_nanoGEN_Run3/250715_223705/0000/ \
--output /users/byates2/afs/www/EFT/tttt_Run3/weights/weights.pdf
'''

#!/usr/bin/env python3

import hist
import os
import awkward as ak
from coffea.nanoevents import NanoEventsFactory, NanoAODSchema
import numpy as np
import matplotlib.pyplot as plt
import mplhep as hep
import topcoffea.modules.utils as utils
plt.style.use(hep.style.CMS)

NanoAODSchema.warn_missing_crossrefs = False

if __name__ == '__main__':

import argparse
# 'EFTrwgt66_ctW_0.0_ctq1_0.0_cQq81_0.0_ctZ_0.0_cQq83_0.0_ctG_0.0_ctq8_0.0_cQq13_0.0_cQq11_0.0'

argParser = argparse.ArgumentParser(description = "Argument parser")
argParser.add_argument('--input', action='store', default='root://cmseos.fnal.gov//store/user/dspitzba/EFT/nanogen_small.root', help="Input file")
argParser.add_argument('--output', action='store', default='./weights.pdf', help="Output file")
args = argParser.parse_args()

path = args.input
#files = ['nanogen_12.root', 'nanogen_49.root', 'nanogen_51.root', 'nanogen_54.root', 'nanogen_5.root', 'nanogen_82.root', 'nanogen_13.root', 'nanogen_4.root', 'nanogen_52.root', 'nanogen_55.root', 'nanogen_63.root', 'nanogen_9.root', 'nanogen_2.root', 'nanogen_50.root', 'nanogen_53.root', 'nanogen_56.root', 'nanogen_7.root']
files = [f for f in os.listdir(path) if '.root' in f]
#files = [files[0]] # Only process a single file
#files = [files[x] for x in range(10)] # Only process the first 10 files
#files = ['nanogen_123_220.root']

weight_axis = hist.axis.Regular(22, -20, 2, name="weight_ax", label="weight", underflow=True, overflow=True)
#weight_axis = hist.axis.Regular(100, -1, 1e1, name="weight_ax", label="weight", underflow=True, overflow=True)
h_SM = hist.Hist(weight_axis)
events = NanoEventsFactory.from_root(
#args.input,
path+files[0],
schemaclass=NanoAODSchema,
).events()

w = events.LHEWeight
eft_weight_names = [ x for x in w.fields if x.startswith('EFTrwgt') ]
#h_ttG = hist.Hist(weight_axis)
h_ttG = []
eft_coeffs = ak.to_numpy(events["EFTfitCoefficients"]) if hasattr(events, "EFTfitCoefficients") else None
#h_ttG_rwgt = HistEFT(weight_axis, wc_names=['ctZ', 'cpt', 'cpQM', 'cpQ3', 'ctW', 'ctp', 'ctG'], label=r"Events")
for weight in eft_weight_names:
h_ttG.append(hist.Hist(weight_axis))

print('[', end='')
for fin in files:
events = NanoEventsFactory.from_root(
#args.input,
path+fin,
schemaclass=NanoAODSchema,
).events()

w = events.LHEWeight
eft_weight_names = [ x for x in w.fields if x.startswith('EFTrwgt') ]
if not eft_weight_names:
eft_weight_names = utils.get_list_of_wc_names(path+fin)
eft_weight_names = ['_0.0_'.join(eft_weight_names)]
print(fin, eft_weight_names)

sm_wgt = getattr(events.LHEWeight, eft_weight_names[-1])
print('.', end='')
#print(f'Smallest non-zero SM weight: {np.min(sm_wgt[sm_wgt != 0.0])}')
h_SM.fill(weight_ax=np.log10(getattr(events.LHEWeight, eft_weight_names[-1])))
#h_SM.fill(weight_ax=getattr(events.LHEWeight, eft_weight_names[-1]))

#h_ttG = [(w,hist.Hist(weight_axis)) for w in ttG]
#h_ctg1 = hist.Hist(weight_axis)
#h_ctg1.fill(weight_ax=getattr(events.LHEWeight, 'EFTrwgt66_ctW_0.0_ctq1_0.0_cQq81_0.0_ctZ_0.0_cQq83_0.0_ctG_0.0_ctq8_0.0_cQq13_0.0_cQq11_0.0'))

# EFTrwgt10_ctGRe_2.0_ctGIm_0.0_ctWRe_0.0_ctWIm_0.0_ctBRe_0.0_ctBIm_0.0_cHtbRe_0.0_cHtbIm_0.0_cHt_0.0
#h_ctg2 = hist.Hist(weight_axis)
#h_ctg2.fill(weight_ax=getattr(events.LHEWeight, 'EFTrwgt0_ctW_-1.722436_ctq1_1.171197_cQq81_1.34397_ctZ_-6.408086_cQq83_1.555205_ctG_0.2893_ctq8_-0.625025_cQq13_-1.305265_cQq11_1.762244'))
#for (weight,h) in h_ttG:
# h.fill(weight_ax=getattr(events.LHEWeight, weight))


#h_ctg1.plot1d(ax=ax, label=r'$C_{tG}=1$')
#h_ctg2.plot1d(ax=ax, label=r'$C_{tG}=2$')
#for i,(_,h) in enumerate(h_ttG):
# h.plot1d(ax=ax, label=f'wgt_{i}')

for nw,weight in enumerate(eft_weight_names[:-1]):
h_ttG[nw].fill(weight_ax=np.log10(getattr(events.LHEWeight, weight)))
#h_ttG[nw].fill(weight_ax=getattr(events.LHEWeight, weight))
#wgts = getattr(events.LHEWeight, 'EFTrwgt_ctZ_0.0_cpt_0.0_cpQM_0.0_cpQ3_0.0_ctW_0.0_ctp_0.0_ctG_0.0')
#h_ttG_rwgt.fill(weight_ax=wgts, eft_coeff=eft_coeffs)
print(']')

for iweight,weight in enumerate(eft_weight_names):
fig, ax = plt.subplots()

hep.histplot(h_SM, ax=ax, label=r'$SM=0$', flow='show', histtype='errorbar', yerr=False)
label = '$tt\gamma$'
label = 'EFT'
hep.histplot(h_ttG[iweight], ax=ax, label=label, flow='show', histtype='errorbar', yerr=False)
# 'EFTrwgt66_ctW_0.0_ctq1_0.0_cQq81_0.0_ctZ_0.0_cQq83_0.0_ctG_0.0_ctq8_0.0_cQq13_0.0_cQq11_0.0'
eft_weight = weight.split('_')[1:]
wcs = eft_weight[:-1:2]
vals = eft_weight[1::2]
vals = [float(v) for v in vals]
eft_pt = dict(zip(wcs,vals))
#hep.histplot(h_ttG_rwgt.as_hist(eft_pt), ax=ax, label=label+' reweight', flow='show', histtype='errorbar', yerr=False)
ax.set_ylabel(r'# Events')
ax.set_xlabel(r'log(weight)')

#ax.set_xscale("log")
ax.set_yscale("log")
#ax.set_xlim([1e-2, 1e2])
#ax.set_xlim([1e-6, 1e2])
#ax.set_xlim([1e-2, 1e2])

plt.legend()

fig.savefig(args.output)
fig.savefig(args.output.replace('.pdf', '_rwgt{}.pdf'.format(iweight)))
fig.savefig(args.output.replace('.pdf', '_rwgt{}.png'.format(iweight)))
print(f"Figure saved in {args.output.replace('.pdf', '_rwgt{}.pdf'.format(iweight))}")
#fig.savefig(args.output.replace('.pdf', '_{}.pdf'.format(eft_weight_names[iweight])))
#fig.savefig(args.output.replace('.pdf', '_{}.png'.format(eft_weight_names[iweight])))
#print(f"Figure saved in {args.output.replace('.pdf', '_rwgt{}.pdf'.format(iweight))}")
plt.close()
66 changes: 66 additions & 0 deletions analysis/mc_validation/gen_hist_eventweights_plotter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
'''
This script plots weights produced by `gen_hist_eventweights_processor.py`
Example:
python gen_hist_eventweights_plotter.py 2022_tllq_NewStPt4.pkl.gz /users/byates2/afs/www/EFT/tllq_NewStPt4_Run3/weights/weights.pdf
'''
import os
import pickle
import gzip
#import numpy as np
import matplotlib.pyplot as plt
import argparse
from topeft.modules import axes
BINNING = {k: v['variable'] for k,v in axes.info.items() if 'variable' in v}

#Load hists from pickle file created by TopCoffea
hists={}

parser = argparse.ArgumentParser(description='You can select which file to run over')
parser.add_argument('fin' , default='analysis/topEFT/histos/mar03_central17_pdf_np.pkl.gz' , help = 'File to run over')
parser.add_argument('output' , default='/users/byates2/afs/www/EFT/tllq_NewStPt4_Run3/weights/' , help = 'Output path')
args = parser.parse_args()
fin = args.fin

#hin = pickle.load(gzip.open(fin))
#for k in hin.keys():
# if k in hists: hists[k]+=hin[k]
# else: hists[k]=hin[k]
with gzip.open(fin) as fin:
hin = pickle.load(fin)
for k in hin.keys():
if isinstance(hin[k], dict):
continue
if k in hists: hists[k]+=hin[k]
else: hists[k]=hin[k]

for h_name in hists:
ls = '-'
if 'coeff' in h_name: continue
if 'efth' in h_name: continue
if 'SM' in h_name and False:
label = 'SM'
elif 'neg' in h_name:
ls = '--'
elif 'abs' in h_name:
ls = '-.'
if 'pt' in h_name:
label = 'EFT' + h_name.split('_')[1]
else:
label = h_name.split('_')[1]
hists[h_name].plot1d(label=label, yerr=False, ls=ls, flow='show')
#(hists[h_name]/np.sum(hists['weights_SMabs_log'].values(flow=True))).plot1d(label=label, yerr=False, ls=ls, flow='show')
#plt.gca().set_ylabel('log(weights) / sum(SMpos)')
if 'coeff' in h_name: continue
#if 'coeff' in h_name: hists[h_name].plot1d(label=label, yerr=False)#, flow='show', ls='--')
elif 'efth' in h_name: continue
#elif 'efth' in h_name: hists[h_name].plot1d(label=label, yerr=False, flow='show', ls='-.')
#else: hists[h_name].plot1d(label=label, yerr=False)#, flow='show')

plt.legend(ncol=3)
plt.gca().set_yscale('log')
plt.gca().set_xlabel('log(event weights)')
plt.tight_layout()
os.makedirs(f'{args.output}', exist_ok=True)
plt.savefig(f'{args.output}/weights.pdf')
plt.savefig(f'{args.output}/weights.png')
#plt.savefig(args.output.replace('.pdf', '.png'))
Loading
Loading