Skip to content
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
ff6d9a0
Various scripts for validation
bryates Aug 15, 2025
5baa4ce
Fix lint issues
bryates Aug 15, 2025
07171f3
More lint
bryates Aug 15, 2025
78f454a
Missed a few
bryates Aug 15, 2025
762ce56
Missed white space
bryates Aug 15, 2025
744fa0b
A few more
bryates Aug 15, 2025
26ff6ae
White space and unused modules
bryates Aug 15, 2025
9209333
Should be the last one
bryates Aug 15, 2025
3c49497
Added more docstrings
bryates Aug 15, 2025
616207f
More details
bryates Aug 15, 2025
8ba744b
Fix typo
bryates Aug 15, 2025
8c5b513
Some improvements and docstring
bryates Aug 15, 2025
a184589
Added code to normalize to same lumi
bryates Aug 15, 2025
a967468
Option to ignore lumi
bryates Aug 15, 2025
ca88b80
More docstring
bryates Aug 15, 2025
f8a2f23
Fix lint
bryates Aug 15, 2025
d71ec42
Draw third normalization if using a fixed point sample
bryates Aug 15, 2025
40e3709
Update README
bryates Aug 15, 2025
3976079
Added notes to README
bryates Aug 15, 2025
d236808
NOTE
bryates Aug 15, 2025
5c34766
note -> memo
bryates Aug 15, 2025
60140c7
Fix some outdated stuff
bryates Aug 15, 2025
ee8fd26
Fix printout
bryates Aug 15, 2025
4ebd7b3
Minor fixes
bryates Aug 19, 2025
92ec2a9
Setting for fixed point
bryates Aug 21, 2025
a2746d1
Avoid running processors directly
bryates Aug 21, 2025
54d3668
Match things like `2023BPix` and `UL16APV`
bryates Aug 21, 2025
55fa4d2
More robust options (scale by SM with `--scale` rather than always 10%)
bryates Aug 26, 2025
2dac74d
Add scale to name
bryates Aug 26, 2025
8d0bb97
Update make json script and add new, "faster" one
bryates Aug 26, 2025
db39755
Fix lint error
bryates Aug 26, 2025
15ee24a
Missing import
bryates Aug 26, 2025
331c722
Add `--central` option to compare 2 central samples
bryates Oct 9, 2025
c66efc4
`--small` divides by 10
bryates Oct 9, 2025
614d0e8
More instructions, add `--abs` flag
bryates Oct 9, 2025
d0627f8
Add inv mass of system (tops + heavy bosons)
bryates Oct 9, 2025
f6a56fe
Change invm range, just H boson for now
bryates Oct 9, 2025
fdc9d61
Update tX
bryates Oct 9, 2025
72b38e3
Added ptZ
bryates Oct 10, 2025
d1ba793
Fix `--abs` flag
bryates Oct 14, 2025
ae0e2d0
Update invm, add Higgs pT
bryates Oct 14, 2025
6974158
Reorganize `norm`
bryates Oct 15, 2025
9ba83cd
Use all tops for tX (so ttt is correct)
bryates Oct 15, 2025
7303c9f
Use `ak.firsts`
bryates Oct 15, 2025
a731752
Fix flag logic
bryates Oct 15, 2025
f9f7390
Fix year bug, add `--info`
bryates Nov 11, 2025
2a27705
More robust `--no-lumi` option
bryates Nov 13, 2025
bae1e78
Fix lint duplicate `invm` error
bryates Nov 13, 2025
7efa43e
Various fixes
bryates Nov 24, 2025
81f805e
Add `--zero`, fix help strings
bryates Nov 24, 2025
161434f
Smaller t_pt range, dynamic lhe based on pdgId
bryates Nov 24, 2025
b3d8b8e
Use `np.argmax`
bryates Nov 24, 2025
02a1f6b
Back to `ak.max`
bryates Nov 24, 2025
4cadf7e
Fix logic
bryates Nov 25, 2025
5b656f8
Fix zero flag bug
bryates Nov 25, 2025
6c49e61
Less dra bins, no lhe_t for now
bryates Dec 1, 2025
ad2aee9
Fixed lhe_t
bryates Dec 2, 2025
84ef9ad
Fix StPt None typo, re-enabled lhe_t_pt
bryates Dec 2, 2025
26d209f
Fix another typo
bryates Dec 2, 2025
6bde68f
Actually fix this time
bryates Dec 2, 2025
45aea6f
Adding DJR and EFT weights scripts
bryates Dec 3, 2025
a2915dd
Update central 2022 ttHnobb xsec
bryates Dec 3, 2025
53c023b
Fix lint errors
bryates Dec 3, 2025
e42b6cc
Add docstring
bryates Dec 3, 2025
3d6a48f
Add link to starting points json
bryates Dec 4, 2025
d609129
Adding gridpack validation scripts
bryates Dec 4, 2025
4cefc89
Fix typo
bryates Dec 4, 2025
8536d29
Explain `InputCards`
bryates Dec 4, 2025
3ad3c5c
Fix lint
bryates Dec 4, 2025
1954597
More lint
bryates Dec 4, 2025
fbfd36f
Lint again
bryates Dec 4, 2025
24a7e24
Fix ttlnu hist name
bryates Dec 4, 2025
b1d4eab
Inv mass, change fin2 to dash-dot
bryates Dec 5, 2025
e89c3ff
Different invm plots
bryates Dec 5, 2025
6a3644c
Plot all invm hists
bryates Dec 5, 2025
465b83e
More invm_4t bins
bryates Dec 5, 2025
e1f1e70
Increase invm_ttX bins
bryates Dec 5, 2025
2830ab3
Update plotting style
bryates Dec 8, 2025
7ee762e
Dynamic sqrt(s)
bryates Dec 8, 2025
d15e740
Update lumi decimal places
bryates Dec 8, 2025
bb064e1
Update EFT label
bryates Dec 8, 2025
6c5684c
Adjusting which tops are saved
bryates Dec 9, 2025
14b6a5f
Different ttX invm bins
bryates Dec 9, 2025
a062205
Change label when using `--small`
bryates Dec 9, 2025
76aaea5
More ttX bins
bryates Dec 9, 2025
796ead4
Smaller tX bins
bryates Dec 10, 2025
be5f08e
Move shebang to top
bryates Dec 11, 2025
b4095be
Move shebang
bryates Dec 11, 2025
c4fbb03
Fix typo
bryates Dec 11, 2025
b8c1a0d
Code cleanup
bryates Dec 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions analysis/mc_validation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,17 @@ This directory contains scripts from the validation studies of the FullR2 privat
- Should be run on the output of the topeft processor
- Was used during the June 2022 MC validation studies (for TOP-22-006 pre approval checks)

:memo: The scripts below were updated in August of 2025
* `gen_processor.py`:
- This is an updated script to produce gen level histograms for comparison of various samples

* `gen_hist_eventweights_processor.py`:
- This script produces distributions of MG weights (if they are saved in the samples)

* `gen_hist_eventweights_plotter.py`:
- This script produces plots of MG weights
- Should be run on the output of the gen_hist_eventweights_processor.py processor

* `comp_norm.py`:
- This script plots two pkl files for different variables for comparison of shapes and normalizations
- Should be run on the output of the gen_processor.py processor
430 changes: 430 additions & 0 deletions analysis/mc_validation/comp_norm.py

Large diffs are not rendered by default.

66 changes: 66 additions & 0 deletions analysis/mc_validation/gen_hist_eventweights_plotter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
'''
This script plots weights produced by `gen_hist_eventweights_processor.py`
Example:
python gen_hist_eventweights_plotter.py 2022_tllq_NewStPt4.pkl.gz /users/byates2/afs/www/EFT/tllq_NewStPt4_Run3/weights/weights.pdf
'''
import os
import pickle
import gzip
#import numpy as np
import matplotlib.pyplot as plt
import argparse
from topeft.modules import axes
BINNING = {k: v['variable'] for k,v in axes.info.items() if 'variable' in v}

#Load hists from pickle file created by TopCoffea
hists={}

parser = argparse.ArgumentParser(description='You can select which file to run over')
parser.add_argument('fin' , default='analysis/topEFT/histos/mar03_central17_pdf_np.pkl.gz' , help = 'File to run over')
parser.add_argument('output' , default='/users/byates2/afs/www/EFT/tllq_NewStPt4_Run3/weights/' , help = 'Output path')
args = parser.parse_args()
fin = args.fin

#hin = pickle.load(gzip.open(fin))
#for k in hin.keys():
# if k in hists: hists[k]+=hin[k]
# else: hists[k]=hin[k]
with gzip.open(fin) as fin:
hin = pickle.load(fin)
for k in hin.keys():
if isinstance(hin[k], dict):
continue
if k in hists: hists[k]+=hin[k]
else: hists[k]=hin[k]

for h_name in hists:
ls = '-'
if 'coeff' in h_name: continue
if 'efth' in h_name: continue
if 'SM' in h_name and False:
label = 'SM'
elif 'neg' in h_name:
ls = '--'
elif 'abs' in h_name:
ls = '-.'
if 'pt' in h_name:
label = 'EFT' + h_name.split('_')[1]
else:
label = h_name.split('_')[1]
hists[h_name].plot1d(label=label, yerr=False, ls=ls, flow='show')
#(hists[h_name]/np.sum(hists['weights_SMabs_log'].values(flow=True))).plot1d(label=label, yerr=False, ls=ls, flow='show')
#plt.gca().set_ylabel('log(weights) / sum(SMpos)')
if 'coeff' in h_name: continue
#if 'coeff' in h_name: hists[h_name].plot1d(label=label, yerr=False)#, flow='show', ls='--')
elif 'efth' in h_name: continue
#elif 'efth' in h_name: hists[h_name].plot1d(label=label, yerr=False, flow='show', ls='-.')
#else: hists[h_name].plot1d(label=label, yerr=False)#, flow='show')

plt.legend(ncol=3)
plt.gca().set_yscale('log')
plt.gca().set_xlabel('log(event weights)')
plt.tight_layout()
os.makedirs(f'{args.output}', exist_ok=True)
plt.savefig(f'{args.output}/weights.pdf')
plt.savefig(f'{args.output}/weights.png')
#plt.savefig(args.output.replace('.pdf', '.png'))
220 changes: 220 additions & 0 deletions analysis/mc_validation/gen_hist_eventweights_processor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
'''
This script produces histograms of the log of the event weights
It assumes the MG weights are saved in the nanoGEN file.

Example:
python run_gen_hist_eventweights_processor.py ../../input_samples/sample_jsons/signal_samples/private_UL/2022_tllq_NewStPt4_nanoGEN.json -o 2022_tllq_NewStPt4 -x futures -r file:///cms/cephfs/data/ -p gen_hist_eventweights_processor.py
'''
#!/usr/bin/env python
import awkward as ak
import numpy as np
np.seterr(divide='ignore', invalid='ignore', over='ignore')
import hist
from hist import Hist
import topcoffea.modules.eft_helper as efth

from coffea import processor

# Get the lumi for the given year
def get_lumi(year):
lumi_dict = {
"2016APV": 19.52,
"2016": 16.81,
"2017": 41.48,
"2018": 59.83
}
if year not in lumi_dict.keys():
raise Exception(f"(ERROR: Unknown year \"{year}\".")
else:
return lumi_dict[year]

# Clean the objects
def is_clean(obj_A, obj_B, drmin=0.4):
objB_near, objB_DR = obj_A.nearest(obj_B, return_metric=True)
mask = ak.fill_none(objB_DR > drmin, True)
return (mask)

# Create list of wc values in the correct order
def order_wc_values(wcs, ref_pts):
'''Returns list of wc values in the same order as the list of wc names based on a dictionary
'''
wc_names = wcs
ref_pts = ref_pts

wc_values = []
for i in wc_names:
wc_values.append(ref_pts[i])

return wc_values

# Calculate event weights from wc values and eft fit coefficients
def calc_event_weights(eft_coeffs, wc_vals):
'''Returns an array that contains the event weight for each event.
eft_coeffs: Array of eft fit coefficients for each event
wc_vals: wilson coefficient values desired for the event weight calculation, listed in the same order as the wc_lst
such that the multiplication with eft_coeffs is correct
The correct ordering can be achieved with the order_wc_values function
'''
event_weight = np.empty_like(eft_coeffs)

wcs = np.hstack((np.ones(1),wc_vals))
wc_cross_terms = []
index = 0
for j in range(len(wcs)):
for k in range (j+1):
term = wcs[j]*wcs[k]
wc_cross_terms.append(term)
event_weight = np.sum(np.multiply(wc_cross_terms, eft_coeffs), axis=1)

return event_weight

class AnalysisProcessor(processor.ProcessorABC):
def __init__(self, samples, wc_names_lst=[], hist_lst = None, dtype=np.float32, do_errors=False):
self._samples = samples
self._wc_names_lst = wc_names_lst

self._dtype = dtype
self._do_errors = do_errors

print("self._samples", self._samples)
print("self._wc_names_lst", self._wc_names_lst)

# Create the histograms with new scikit hist
self._low = -12
self._high = 3
self._histo_dict = {
"weights_SM_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_SM_log", label='log(event weights at the SM)' ), storage="weight"),
"weights_SMneg_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_SMneg_log", label='log(event weights at the SMneg)' ), storage="weight"),
"weights_SMabs_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_SMabs_log", label='log(event weights at the SMabs)' ), storage="weight"),
"weights_SMefth_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_SMefth_log", label='log(event weights at the SMefth)' ), storage="weight"),
"weights_SMcoeff_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_SMcoeff_log", label='log(event weights at the SM)' ), storage="weight"),
"weights_pt1_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_pt1_log", label='log(event weights at the pt1)'), storage="weight"),
"weights_pt2_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_pt2_log", label='log(event weights at the pt2)'), storage="weight"),
"weights_pt3_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_ptself._high_log", label='log(event weights at the ptself._high)'), storage="weight"),
"weights_pt4_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_pt4_log", label='log(event weights at the pt4)'), storage="weight"),
"weights_pt5_log" : Hist(hist.axis.Regular(bins=120, start=self._low, stop=self._high, name="weights_pt5_log", label='log(event weights at the pt5)'), storage="weight"),
#"events_100" : {},
#"events" : {},
}

# Set the list of hists to to fill
if hist_lst is None:
self._hist_lst = list(self._histo_dict.keys())
else:
for h in hist_lst:
if h not in self._histo_dict.keys():
raise Exception(f"Error: Cannot specify hist \"{h}\", it is not defined in self._histo_dict")
self._hist_lst = hist_lst

print("hist_lst: ", self._hist_lst)

@property
def columns(self):
return self._columns

def process(self, events):

# Dataset parameters
dataset = events.metadata['dataset']
hist_axis_name = self._samples[dataset]["histAxisName"]

year = self._samples[dataset]['year']
xsec = self._samples[dataset]['xsec']
sow = self._samples[dataset]['nSumOfWeights']

# Extract the EFT quadratic coefficients and optionally use them to calculate the coefficients on the w**2 quartic function
# eft_coeffs is never Jagged so convert immediately to numpy for ease of use.
eft_coeffs = ak.to_numpy(events['EFTfitCoefficients']) if hasattr(events, "EFTfitCoefficients") else None

# else:
SM_pt = {wc: 0.0 for wc in self._wc_names_lst}
#wc_lst_SM = order_wc_values(self._wc_names_lst, SM_pt)
wc_lst_SM = np.zeros(len(self._wc_names_lst))
event_weights_SM = calc_event_weights(eft_coeffs,wc_lst_SM)
event_weights_SMcoeff = eft_coeffs[:,0]
#wc_lst_SM = np.zeros(len(self._wc_names_lst))
#event_weights_SM = efth.calc_eft_weights(eft_coeffs,wc_lst_SM)
event_weights_SMefth = efth.calc_eft_weights(eft_coeffs,np.zeros(len(self._wc_names_lst)))

eft_weight_names = [ x for x in events['LHEWeight'].fields if x.startswith('EFTrwgt') ][:10]
if not eft_weight_names:
wc_lst_pt1 = np.random.uniform(-10, 10, len(self._wc_names_lst))
#wc_lst_pt1 = order_wc_values(self._wc_names_lst, rwgt1)
event_weights_pt1 = calc_event_weights(eft_coeffs, wc_lst_pt1)
else:
wc_vals = eft_weight_names[0].split('_')[1:]
wc_vals = np.array([float(wc) for wc in wc_vals[1::2]])
event_weights_pt1 = efth.calc_eft_weights(eft_coeffs,wc_vals)

#wc_lst_pt2 = order_wc_values(self._wc_names_lst, rwgt2)
if not eft_weight_names:
wc_lst_pt2 = np.random.uniform(-10, 10, len(self._wc_names_lst))
event_weights_pt2 = calc_event_weights(eft_coeffs, wc_lst_pt2)
else:
wc_vals = eft_weight_names[1].split('_')[1:]
wc_vals = np.array([float(wc) for wc in wc_vals[1::2]])
event_weights_pt2 = efth.calc_eft_weights(eft_coeffs,wc_vals)

#wc_lst_pt3 = order_wc_values(self._wc_names_lst, rwgt3)
if not eft_weight_names:
wc_lst_pt3 = np.random.uniform(-10, 10, len(self._wc_names_lst))
event_weights_pt3 = calc_event_weights(eft_coeffs, wc_lst_pt3)
else:
wc_vals = eft_weight_names[2].split('_')[1:]
wc_vals = np.array([float(wc) for wc in wc_vals[1::2]])
event_weights_pt3 = efth.calc_eft_weights(eft_coeffs,wc_vals)

#wc_lst_pt4 = order_wc_values(self._wc_names_lst, rwgt4)
if not eft_weight_names:
wc_lst_pt4 = np.random.uniform(-10, 10, len(self._wc_names_lst))
event_weights_pt4 = calc_event_weights(eft_coeffs, wc_lst_pt4)
else:
wc_vals = eft_weight_names[3].split('_')[1:]
wc_vals = np.array([float(wc) for wc in wc_vals[1::2]])
event_weights_pt4 = efth.calc_eft_weights(eft_coeffs,wc_vals)

#wc_lst_pt5 = order_wc_values(self._wc_names_lst, rwgt5)
if not eft_weight_names:
wc_lst_pt5 = np.random.uniform(-10, 10, len(self._wc_names_lst))
event_weights_pt5 = calc_event_weights(eft_coeffs, wc_lst_pt5)
else:
wc_vals = eft_weight_names[4].split('_')[1:]
wc_vals = np.array([float(wc) for wc in wc_vals[1::2]])
event_weights_pt5 = efth.calc_eft_weights(eft_coeffs,wc_vals)

counts = np.ones_like(event_weights_SM)

######## Fill histos ########
hout = self._histo_dict
variables_to_fill = {
"weights_SM_log" : np.nan_to_num(np.log10(event_weights_SM ), nan=self._low-1),
"weights_SMneg_log" : np.nan_to_num(np.log10(event_weights_SM[event_weights_SM < 0] ), nan=self._low-1),
"weights_SMabs_log" : np.nan_to_num(np.log10(np.abs(event_weights_SM) ), nan=self._low-1),
"weights_SMefth_log" : np.nan_to_num(np.log10(event_weights_SMefth ), nan=self._low-1),
"weights_SMcoeff_log" : np.nan_to_num(np.log10(event_weights_SMcoeff ), nan=self._low-1),
"weights_pt1_log": np.nan_to_num(np.log10(event_weights_pt1), nan=self._low-1),
"weights_pt2_log": np.nan_to_num(np.log10(event_weights_pt2), nan=self._low-1),
"weights_pt3_log": np.nan_to_num(np.log10(event_weights_pt3), nan=self._low-1),
"weights_pt4_log": np.nan_to_num(np.log10(event_weights_pt4), nan=self._low-1),
"weights_pt5_log": np.nan_to_num(np.log10(event_weights_pt5), nan=self._low-1),
}

for var_name, var_values in variables_to_fill.items():
if var_name not in self._hist_lst:
print(f"Skipping \"{var_name}\", it is not in the list of hists to include")
continue

if 'neg' in var_name: hout[var_name].fill(var_values, weight=counts[event_weights_SM < 0])
else: hout[var_name].fill(var_values, weight=counts)
#if ak.any(event_weights_SM[event_weights_SM > 100]):
# hout['events_100'][events.metadata['filename']] = event_weights_SM[event_weights_SM > 100]
#hout['events'][events.metadata['filename']] = event_weights_SM.tolist()

return hout

def postprocess(self, accumulator):
return accumulator


if __name__ == '__main__':
raise Exception('Please use `run_gen_hist_eventweights_processor.py` to run this processor!')
Loading
Loading