Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
f205bfe
First commit;
ecole41 Jun 20, 2025
bf1e7d8
data central
ecole41 Jun 23, 2025
68e203d
Added kinematics
ecole41 Jun 25, 2025
d1adbcc
Started uncertainties
ecole41 Jun 25, 2025
43cf998
Artificial uncertainties
ecole41 Jul 9, 2025
d8e6aec
fixes with utila nad metadata
ecole41 Jul 9, 2025
8ef8942
symmetrise errors
ecole41 Jul 10, 2025
0580b11
symmetrise errors
ecole41 Jul 10, 2025
15e98dd
Systematics
ecole41 Jul 10, 2025
dbef8c6
Fixed asymm errors
ecole41 Aug 4, 2025
3f24200
Separated into separate observables
ecole41 Aug 12, 2025
6bde456
Added kinematics labels to metadata
ecole41 Aug 22, 2025
cc395a8
Replaced Hepdata tables and updated metadata.yaml with optimal set of…
enocera Aug 27, 2025
421a5bb
Streamlined data and kinematic file generation
enocera Aug 27, 2025
6019a3a
Deleted redundant files: corrected Hepdata tables; restructured and s…
enocera Aug 27, 2025
109f27a
Corrected metadata syntax
enocera Aug 27, 2025
202b0e2
Corrected number of points
enocera Aug 27, 2025
2330693
First commit;
ecole41 Jun 20, 2025
52ebd23
data central
ecole41 Jun 23, 2025
7f15390
Added kinematics
ecole41 Jun 25, 2025
883bca9
Started uncertainties
ecole41 Jun 25, 2025
eb117c5
Artificial uncertainties
ecole41 Jul 9, 2025
64aba29
fixes with utila nad metadata
ecole41 Jul 9, 2025
5d820f2
symmetrise errors
ecole41 Jul 10, 2025
77b5b84
symmetrise errors
ecole41 Jul 10, 2025
665b6af
Systematics
ecole41 Jul 10, 2025
5430028
Fixed asymm errors
ecole41 Aug 4, 2025
21ff988
Separated into separate observables
ecole41 Aug 12, 2025
70f0be3
Added kinematics labels to metadata
ecole41 Aug 22, 2025
f6e22dd
Replaced Hepdata tables and updated metadata.yaml with optimal set of…
enocera Aug 27, 2025
f40ed74
Streamlined data and kinematic file generation
enocera Aug 27, 2025
2c870b6
Deleted redundant files: corrected Hepdata tables; restructured and s…
enocera Aug 27, 2025
0522fa7
Corrected metadata syntax
enocera Aug 27, 2025
9665cd0
Corrected number of points
enocera Aug 27, 2025
de68a7d
filter cleanup
enocera Aug 28, 2025
18eda5c
Cleanup filter
enocera Aug 28, 2025
3d6dce2
Corrected wrong luminosity label
enocera Oct 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions nnpdf_data/nnpdf_data/commondata/ATLAS_WCHARM_13TEV/data.yaml
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enocera Do we want to have all of the channels ((W−+D+), (W++D−), (W−+D∗+), (W++D∗−)) put together into one dataset like this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ecole41 Yes we want a single data set with all the channels. Two remarks.

  1. One must be cautious w.r.t. the treatment of correlations: there may be uncertainties that are correlated across all the channels and uncertainties that are not. But this is something that you surely know better than me, because you've gone through the paper. Also, when definining the theory part of the metadata, one should have four entries, with four different names, one for each channel.
  2. I realise that this measurement is only particle level, which may complicate the way in which we will have to make theoretical predictions, but this is something that we can discuss later.

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
data_central:
- 12.27
- 11.57
- 10.41
- 9.09
- 6.85
- 11.87
- 11.55
- 10.09
- 8.6
- 6.25
- 12.18
- 11.77
- 10.61
- 8.85
- 7.22
- 12.52
- 12.14
- 10.29
- 8.38
- 6.55
35 changes: 35 additions & 0 deletions nnpdf_data/nnpdf_data/commondata/ATLAS_WCHARM_13TEV/filter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""
When running `python filter.py` the relevant data yaml
file will be created in the `nnpdf_data/commondata/ATLAS_WPWM_7TEV_46FB` directory.
"""

import yaml
from filter_utils import get_data_values, get_kinematics
from nnpdf_data.filter_utils.utils import prettify_float

yaml.add_representer(float, prettify_float)


def filter_ATLAS_WCHARM_13TEV_data_kinematic():
"""
This function writes the systematics to yaml files.
"""

central_values = get_data_values()

kin = get_kinematics()

data_central_yaml = {"data_central": central_values}

kinematics_yaml = {"bins": kin}

# write central values and kinematics to yaml file
with open("data.yaml", "w") as file:
yaml.dump(data_central_yaml, file, sort_keys=False)

with open("kinematics.yaml", "w") as file:
yaml.dump(kinematics_yaml, file, sort_keys=False)


if __name__ == "__main__":
filter_ATLAS_WCHARM_13TEV_data_kinematic()
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
"""
This module contains helper functions that are used to extract the data values
from the rawdata files.
"""

import yaml
import pandas as pd
import numpy as np


def get_data_values():
"""
returns the central data values in the form of a list.
"""

data_central = []

for i in range(19, 23):
hepdata_table = f"rawdata/HEPData-ins2628732-v1-Table_{i}.yaml"

with open(hepdata_table, 'r') as file:
input = yaml.safe_load(file)

values = input['dependent_variables'][0]['values']

for value in values:
# store data central and convert the units and apply the correction factor
data_central.append(value['value'])

return data_central


def get_kinematics():
"""
returns the kinematics in the form of a list of dictionaries.
"""
kin = []

for i in range(19, 23):
hepdata_table = f"rawdata/HEPData-ins2628732-v1-Table_{i}.yaml"

with open(hepdata_table, 'r') as file:
input = yaml.safe_load(file)

for i, M in enumerate(input["independent_variables"][0]['values']):
kin_value = {
'abs_eta': {'min': None, 'mid': (0.5 * (M['low'] + M['high'])), 'max': None},
'm_W2': {'min': None, 'mid': 6.46046213e03, 'max': None},
'sqrts': {'min': None, 'mid': 13000.0, 'max': None},
}
kin.append(kin_value)

return kin

def decompose_covmat(covmat):
"""Given a covmat it return an array sys with shape (ndat,ndat)
giving ndat correlated systematics for each of the ndat point.
The original covmat is obtained by doing [email protected]"""

lamb, mat = np.linalg.eig(covmat)
sys = np.multiply(np.sqrt(lamb), mat)
return sys

def get_uncertainties():
"""
returns the uncertainties.
"""

ndat = 5
# Produce covmat of form [[W-/W+],[0],
Copy link
Collaborator Author

@ecole41 ecole41 Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enocera We are given the covariance matrices for W-/W+ and for W-( *)/W+(*). I am assuming that means there are no correlations between the excited (*) and non-excited channels.
I wanted to check what should be done to construct the systematics with this.

We are given the systematics for each point but I am not sure if these include the correlations from this covmat. The covariances matrices are also combined statistical and systematic uncertainty covariance matrices so does this mean that decomposing this covmat will give the final systematics with correlations accounted for?

Copy link
Contributor

@enocera enocera Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @ecole41 after looking into this data set a little, I suggest to implement two variants insofar as uncertainties are concerned. This means that you have to generate two uncertainties.yaml files, with different names of your choice (e.g. uncertainties.yaml and uncertainties_covariances.yaml).
1 - The first variant, associated to the uncertainties.yaml file, is the one in which you implement the breakdown of 280 uncertainties provided, for each bin, in the last column of Tables 19-22. These uncertainties, that have the same names across all channels, can be correlated all over the place.
2 - The second variant, associated to the uncertainties_covarainces.yaml file, is the one in which you implement the information that you've got from Tables 15-18. The way I'd do this is as follows: first construct a block-diagonal covariance matrix; then generate N_dat artificial systematics from this covariance matrix. The function that generates artificial systematics from a given covariance matrix is here.

Copy link
Collaborator Author

@ecole41 ecole41 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that makes a lot of sense.

I just wanted to check the function used to symmetrise the errors. The function in: nnpdf_data/nnpdf_data/filter_utils/uncertainties.py, shows that the se_delta is equal to the average of the two errors and the se_sigma is related to their difference. Should this be the other way around?

Copy link
Collaborator Author

@ecole41 ecole41 Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wanted to check how we should treat the se_delta and se_sigma values. Do we need add the se_delta onto the central data values and use these in the data.yaml file?

Copy link
Contributor

@enocera enocera Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to check the function used to symmetrise the errors. The function in: nnpdf_data/nnpdf_data/filter_utils/uncertainties.py, shows that the se_delta is equal to the average of the two errors and the se_sigma is related to their difference. Should this be the other way around?

You are totally right, that function looks wrong, it should be the other way around to be consistent with Eqs. (23)-(24) and (27) of https://arxiv.org/pdf/physics/0403086. We should check how many data sets have been affected by that typo. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wanted to check how we should treat the se_delta and se_sigma values. Do we need add the se_delta onto the central data values and use these in the data.yaml file?

You have to compute se_delta for each data point, sum them up, shift the exp central value and dump that into the data.yaml file. And likewise for the uncertainty. We are essentially using Eqs.(27)-(28) of https://arxiv.org/pdf/physics/0403086. The function above computes se_delta and se_sigma for a single data point i, but then you have to sum over all points, manipulate central values and uncertainties, and finally write the manipulated central values and uncertainties in the corresponding .yaml files. Hope that this clarifies the issue.

# [0],[W-*/W+*]]
covmat = np.zeros((4*ndat, 4*ndat)) # Multiply by 4 because of W+/- and */not *

def edit_covmat(filename, offset):
with open(filename) as f:
data = yaml.safe_load(f)
flat_values = [v["value"] for v in data["dependent_variables"][0]["values"]]
matrix = np.array(flat_values).reshape((2 * ndat, 2 * ndat))
covmat[offset:offset + 2 * ndat, offset:offset + 2 * ndat] = matrix

edit_covmat("rawdata/HEPData-ins2628732-v1-Table_16.yaml", offset=0)
edit_covmat("rawdata/HEPData-ins2628732-v1-Table_18.yaml", offset=2 * ndat)

sys = decompose_covmat(covmat)
49 changes: 49 additions & 0 deletions nnpdf_data/nnpdf_data/commondata/ATLAS_WCHARM_13TEV/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
setname: ATLAS_WCHARM_13TEV

nnpdf_metadata:
nnpdf31_process: DY CC
experiment: ATLAS

arXiv:
url: https://arxiv.org/abs/2302.00336
journal: Phys. Rev. D 108 (2023) 032012
iNSPIRE:
url: https://inspirehep.net/literature/2628732
hepdata:
url: https://www.hepdata.net/record/ins2628732
version: 1

version: 1
version_comment: Implementation

implemented_observables:
- observable_name:
observable:
description:
label: ATLAS $W^-+c$ 13 TeV
units: '[pb]'
process_type:
tables: [19,20,21,22] # 5/19 (W−+D+), 6/20(W++D−), 9/21(W−+D∗+), 10/22(W++D∗−)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enocera Should these four ((W−+D+), (W++D−), (W−+D∗+), (W++D∗−))be separated into separate observables or be kept as one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be four different observables of the same data set, as, e.g., different differential distributions for top pair production.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and should these all have a separate data, uncertainties and kinematics yaml files? So four of each?

ndata:
plotting:
dataset_label: ATLAS $W^-+c$ 13 TeV
y_label: 'Differential fiducial cross-section times the single-lepton-flavor W boson branching ratio' #In Latex terms?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to put this in Latex terms

x_label: $|\eta^\ell|$
plot_x: abs_eta
kinematic_coverage:
kinematics:
variables:
abs_eta:
description:
label:
units: ''
m_W2:
description:
label:
units:
file:
data_uncertainties:
data_central:
variants:
legacy:
data_uncertainties:
Loading