Skip to content

Commit

Permalink
feat: histogram inputs (#289)
Browse files Browse the repository at this point in the history
* added support for histogram inputs to workspace building
* breaking change: refactored template_builder and template_postprocessor
   - high-level functionality now provided via new templates module
   - lower-level functionality available in submodules
* breaking change: SamplePaths config option renamed to SamplePath
* breaking change: contrib.histogram_creation.from_uproot renamed to contrib.histogram_creator.with_uproot
* extended config schema to support specifying histogram inputs
* added templates.collect API to support handling histogram inputs
* added documentation for histogram input handling
  • Loading branch information
alexander-held authored Oct 8, 2021
1 parent 2b9edf8 commit f84dcc9
Show file tree
Hide file tree
Showing 40 changed files with 1,453 additions and 662 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ repos:
hooks:
- id: black
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.910
rev: v0.910-1
hooks:
- id: mypy
name: mypy with Python 3.7
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ import cabinetry
config = cabinetry.configuration.load("config_example.yml")

# create template histograms
cabinetry.template_builder.create_histograms(config)
cabinetry.templates.build(config)

# perform histogram post-processing
cabinetry.template_postprocessor.run(config)
cabinetry.templates.postprocess(config)

# build a workspace
ws = cabinetry.workspace.build(config)
Expand Down
10 changes: 5 additions & 5 deletions config_example.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ General:
Measurement: "minimal_example"
POI: "Signal_norm"
HistogramFolder: "histograms/"
InputPath: "ntuples/{SamplePaths}"
InputPath: "inputs/{SamplePath}"

Regions:
- Name: "Signal_region"
Expand All @@ -13,17 +13,17 @@ Regions:
Samples:
- Name: "Data"
Tree: "pseudodata"
SamplePaths: "data.root"
SamplePath: "data.root"
Data: True

- Name: "Signal"
Tree: "signal"
SamplePaths: "prediction.root"
SamplePath: "prediction.root"
Weight: "weight"

- Name: "Background"
Tree: "background"
SamplePaths: "prediction.root"
SamplePath: "prediction.root"
Weight: "weight"

Systematics:
Expand All @@ -36,7 +36,7 @@ Systematics:

- Name: "Modeling"
Up:
SamplePaths: "prediction.root"
SamplePath: "prediction.root"
Tree: "background_varied"
Down:
Symmetrize: True
Expand Down
4 changes: 2 additions & 2 deletions docs/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,11 @@ When no user-defined function matches a given histogram that has to be produced,
return hist
cabinetry.template_builder.create_histograms(
cabinetry.templates.build(
cabinetry_config, method="uproot", router=my_router
)
The instance of ``cabinetry.route.Router`` is handed to ``cabinetry.template_builder.create_histograms`` to enable the use of ``build_data_hist``.
The instance of ``cabinetry.route.Router`` is handed to ``cabinetry.templates.build`` to enable the use of ``build_data_hist``.

The function ``build_data_hist`` in this example always returns the same histogram.
Given that the dictionaries in the function signature provide additional information, it is for example possible to return different yields per region:
Expand Down
41 changes: 32 additions & 9 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,34 @@ cabinetry.histo
:members:


cabinetry.template_builder
--------------------------
cabinetry.templates
-------------------

.. automodule:: cabinetry.templates
:members:

cabinetry.templates.builder
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: cabinetry.templates.builder
:members:

.. automodule:: cabinetry.template_builder
cabinetry.templates.collector
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: cabinetry.templates.collector
:members:

cabinetry.templates.postprocessor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: cabinetry.templates.postprocessor
:members:

cabinetry.template_postprocessor
--------------------------------
cabinetry.templates.utils
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: cabinetry.template_postprocessor
.. automodule:: cabinetry.templates.utils
:members:


Expand Down Expand Up @@ -112,8 +129,14 @@ cabinetry.contrib

.. automodule:: cabinetry.contrib

cabinetry.contrib.histogram_creation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cabinetry.contrib.histogram_creator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: cabinetry.contrib.histogram_creator
:members:

cabinetry.contrib.histogram_reader
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: cabinetry.contrib.histogram_creation
.. automodule:: cabinetry.contrib.histogram_reader
:members:
4 changes: 3 additions & 1 deletion docs/config.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _config:

Configuration schema
====================

Expand Down Expand Up @@ -34,7 +36,7 @@ Common options:

.. jsonschema:: ../src/cabinetry/schemas/config.json#/definitions/samples_setting

.. jsonschema:: ../src/cabinetry/schemas/config.json#/definitions/samplepaths_setting
.. jsonschema:: ../src/cabinetry/schemas/config.json#/definitions/samplepath_setting

.. jsonschema:: ../src/cabinetry/schemas/config.json#/definitions/regions_setting

Expand Down
147 changes: 121 additions & 26 deletions docs/core.rst
Original file line number Diff line number Diff line change
@@ -1,44 +1,52 @@
Core concepts
=============

Input file path specification
-----------------------------
Inputs to cabinetry: ntuples or histograms
------------------------------------------

Paths to input files for histogram production are specified with the mandatory ``InputPath`` setting in the ``General`` config section.
``cabinetry`` supports two types of input files when building a workspace: ntuples containing columnar data and histograms.
When using ntuple inputs, ``cabinetry`` needs to know not only where to find the input files for every template histogram it needs to build, but also what selections to apply, which column to extract and how to weight every event.
The configuration schema lists the required options, see :ref:`config` for more information.
Less information is required when using histogram inputs: only the path to each histogram needs to be specified in this case.

Input file path specification for ntuples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Paths to ntuple input files for histogram production are specified with the mandatory ``InputPath`` setting in the ``General`` config section.
If everything is in one file, the value should be the path to this file.
It is common to have multiple input files, split across phase space regions or samples.
For this purpose, the ``InputPath`` value can take two placeholders: ``{RegionPath}`` and ``{SamplePaths}``.
For this purpose, the ``InputPath`` value can take two placeholders: ``{RegionPath}`` and ``{SamplePath}``.

RegionPath
^^^^^^^^^^
""""""""""

When building histograms for a specific region, the ``{RegionPath}`` placeholder takes the value specified in the ``RegionPath`` setting of the corresponding region.
The value of ``RegionPath`` has to be a string.

SamplePaths
^^^^^^^^^^^
SamplePath
""""""""""

The ``{SamplePaths}`` placeholder takes the value given by ``SamplePaths`` of the sample currently processed.
The ``{SamplePath}`` placeholder takes the value given by ``SamplePath`` of the sample currently processed.
This value can either be a string or a list of strings.
If it is a list, multiple copies of ``InputPath`` are created, and in each of them the ``{SamplePaths}`` placeholder takes the value of a different entry in the list.
If it is a list, multiple copies of ``InputPath`` are created, and in each of them the ``{SamplePath}`` placeholder takes the value of a different entry in the list.
All input files are processed, and their contributions are summed together.
The histogram created by ``SamplePaths: ["a.root", "b.root"]`` is equivalent to the histogram created with ``SamplePaths: "a_plus_b.root"``, where ``a_plus_b.root`` is produced by merging both files.
The histogram created by ``SamplePath: ["a.root", "b.root"]`` is equivalent to the histogram created with ``SamplePath: "a_plus_b.root"``, where ``a_plus_b.root`` is produced by merging both files.

Systematics
^^^^^^^^^^^
"""""""""""

It is possible to specify overrides for the ``RegionPath`` and ``SamplePaths`` values in systematic templates.
It is possible to specify overrides for the ``RegionPath`` and ``SamplePath`` values in systematic templates.
If those settings are specified in the ``Up`` or ``Down`` template section of a systematic uncertainty, then the corresponding values are used when building the path to the file used to construct the histogram for this specific template.

An example
^^^^^^^^^^
""""""""""

The following configuration file excerpt shows an example of specifying paths to input files.

.. code-block:: yaml
General:
InputPath: "ntuples/{RegionPath}/{SamplePaths}"
InputPath: "inputs/{RegionPath}/{SamplePath}"
Regions:
- Name: "Signal_region"
Expand All @@ -49,37 +57,124 @@ The following configuration file excerpt shows an example of specifying paths to
Samples:
- Name: "Data"
SamplePaths: "data.root"
SamplePath: "data.root"
- Name: "Signal"
SamplePaths: ["signal_1.root", "signal_2.root"]
SamplePath: ["signal_1.root", "signal_2.root"]
Systematics:
- Name: "Signal_modeling"
Up:
SamplePaths: "signal_variation_up.root"
SamplePath: "modeling_variation_up.root"
Down:
SamplePaths: "signal_variation_down.root"
SamplePath: "modeling_variation_down.root"
Samples: "Signal"
The following files will be read to create histograms:

- for *Signal_region*:

- *Data*: ``ntuples/signal_region/data.root``
- *Signal*: ``ntuples/signal_region/signal_1.root``, ``ntuples/signal_region/signal_2.root``
- *Data*: ``inputs/signal_region/data.root``
- *Signal*: ``inputs/signal_region/signal_1.root``, ``inputs/signal_region/signal_2.root``

- systematic uncertainty:

- *up*: ``inputs/signal_region/modeling_variation_up.root``
- *down*: ``inputs/signal_region/modeling_variation_down.root``

- for *Control_region*:

- *Data*: ``inputs/control_region/data.root``
- *Signal*: ``inputs/control_region/signal_1.root``, ``inputs/control_region/signal_2.root``

- systematic uncertainty:

- *up*: ``inputs/control_region/modeling_variation_up.root``
- *down*: ``inputs/control_region/modeling_variation_down.root``

Input file path specification for histograms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The specification of paths to histograms works very similarly to the ntuple case.
The ``InputPath`` setting in the ``General`` config section is still mandatory.
It can again take placeholders: ``{RegionPath}``, ``{SamplePath}``, and ``{VariationPath}``.
The ``VariationPath`` setting will default to an empty string if not specified, but it can be set to another value (such as ``"nominal"``) in the ``General`` block.

A major difference to the ntuple path construction is that the histogram path needs to not only include the path to the file containing a given histogram, but also to the histogram within the file.
This is achieved by using a colon ``:`` to distinguish between both parts of the path: ``folder/file.root:abc/h1`` points to a histogram called ``h1`` located in a folder called ``abc`` which itself exists within a file called ``file.root`` which can be found in a folder called ``folder``.

When using histogram inputs, use ``cabinetry.templates.collect`` instead of ``cabinetry.templates.build`` (which is used for ntuple inputs).

RegionPath
""""""""""

This works in the same way as it does for ntuples: the ``RegionPath`` setting in each region sets the value for the ``{RegionPath}`` placeholder.
Note that the value cannot be overridden on a per-systematic basis in the histogram case.

SamplePath
""""""""""

The ``SamplePath`` setting sets the value for the ``{SamplePath}`` placeholder.
In contrast to the ntuple case, this value cannot be a list of strings.
It also cannot be overridden on a per-systematic basis, just like ``RegionPath``.

VariationPath
"""""""""""""

Each systematic template can set the value for the ``{VariationPath}`` placeholder via the ``VariationPath`` setting.
``RegionPath`` and ``SamplePath`` settings cannot be overridden.

An example
""""""""""

The following shows an example, similar to the ntuple example.

.. code-block:: yaml
General:
InputPath: "inputs/{RegionPath}.root:{SamplePath}_{VariationPath}"
VariationPath: "nominal"
Regions:
- Name: "Signal_region"
RegionPath: "signal_region"
- Name: "Control_region"
RegionPath: "control_region"
Samples:
- Name: "Data"
SamplePath: "data"
- Name: "Signal"
SamplePath: "signal"
Systematics:
- Name: "Signal_modeling"
Up:
VariationPath: "modeling_variation_up"
Down:
VariationPath: "modeling_variation_down"
Samples: "Signal"
The following histograms will be read:

- for *Signal_region*:

- *Data*: ``inputs/signal_region.root:data_nominal``
- *Signal*: ``inputs/signal_region.root:signal_nominal``

- systematic uncertainty:

- *up*: ``ntuples/signal_region/signal_variation_up.root``
- *down*: ``ntuples/signal_region/signal_variation_down.root``
- *up*: ``inputs/signal_region.root:signal_modeling_variation_up``
- *down*: ``inputs/signal_region:signal_modeling_variation_down``

- for *Control_region*:

- *Data*: ``ntuples/control_region/data.root``
- *Signal*: ``ntuples/control_region/signal_1.root``, ``ntuples/control_region/signal_2.root``
- *Data*: ``inputs/control_region.root:data_nominal``
- *Signal*: ``inputs/control_region.root:signal_nominal``

- systematic uncertainty:

- *up*: ``ntuples/control_region/signal_variation_up.root``
- *down*: ``ntuples/control_region/signal_variation_down.root``
- *up*: ``inputs/control_region.root:signal_modeling_variation_up``
- *down*: ``inputs/control_region:signal_modeling_variation_down``
8 changes: 4 additions & 4 deletions example.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
cabinetry.set_logging()

# check whether input data exists
if not os.path.exists("ntuples/"):
if not os.path.exists("inputs/"):
print("run utils/create_ntuples.py to create input data")
raise SystemExit

Expand All @@ -17,12 +17,12 @@
cabinetry.configuration.print_overview(config)

# create template histograms
cabinetry.template_builder.create_histograms(config, method="uproot")
cabinetry.templates.build(config, method="uproot")

# perform histogram post-processing
cabinetry.template_postprocessor.run(config)
cabinetry.templates.postprocess(config)

# visualize systematics templates
# visualize systematic templates
cabinetry.visualize.templates(config)

# build a workspace and save to file
Expand Down
3 changes: 1 addition & 2 deletions src/cabinetry/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@
import cabinetry.route # noqa: F401
import cabinetry.smooth # noqa: F401
import cabinetry.tabulate # noqa: F401
import cabinetry.template_builder # noqa: F401
import cabinetry.template_postprocessor # noqa: F401
import cabinetry.templates # noqa: F401
import cabinetry.visualize # noqa: F401
import cabinetry.workspace # noqa: F401

Expand Down
Loading

0 comments on commit f84dcc9

Please sign in to comment.