UncertaintySet extension

Putting here the proposal started by @lenzip for further discussion: we can define a new top-level field, alongside `Correction` and `CompoundCorrection`, called `UncertaintySet`. It would have a schema as follows:

```python
import json
from pydantic import BaseModel, ConfigDict


class Model(BaseModel):
    model_config = ConfigDict(extra="forbid")


class FrozenInputSpec(Model):
    """A frozen set of input values to a correction, for use in uncertainty sources"""

    name: str
    value: str | int | float


class UncertaintySpec(Model):
    """An uncertainty source specification"""

    nuisance_name: str
    """Name of the nuisance parameter associated with this uncertainty source
    
    (i.e. combine name in datacard)
    """
    correction: str
    """The correction object (in this file) which defines this uncertainty source"""
    inputs: list[FrozenInputSpec]
    """Frozen input values to use in the correction evaluation"""


class UncertaintySet(Model):
    """A set of uncertainty sources"""

    name: str
    """Uncertainty set name"""
    uncertainties: list[UncertaintySpec]
    """List of uncertainty sources in this set"""
```

with examples such as:
```python
# would be included in /cvmfs/cms-griddata.cern.ch/cat/metadata/BTV/Run2-2016postVFP-UL-NanoAODv9/latest/ak8_xbbcc_tagging.json.gz
btv_example = UncertaintySet(
    name="btagDDBvLV2_L_comb",
    uncertainties=[
        UncertaintySpec(
            nuisance_name="CMS_Xbbtag_comb_correlated",
            correction="btagDDBvLV2_comb",
            inputs=[
                FrozenInputSpec(name="working_point", value="L"),
                FrozenInputSpec(name="systematic", value="up_correlated"),
            ],
        ),
        UncertaintySpec(
            nuisance_name="CMS_Xbbtag_comb_uncorrelated_2016postVFP",
            correction="btagDDBvLV2_comb",
            inputs=[
                FrozenInputSpec(name="working_point", value="L"),
                FrozenInputSpec(name="systematic", value="up_uncorrelated"),
            ],
        ),
    ],
)

# would be included in /cvmfs/cms-griddata.cern.ch/cat/metadata/JME/Run2-2016postVFP-UL-NanoAODv9/latest/jet_jerc.json.gz
jme_example = UncertaintySet(
    name="JESFull_AK4PFchs",
    uncertainties=[
        UncertaintySpec(
            nuisance_name="CMS_scale_j_AbsoluteMPFBias",
            correction="Summer19UL16_V7_MC_AbsoluteMPFBias_AK4PFchs",
            inputs=[],
        ),
        UncertaintySpec(
            nuisance_name="CMS_scale_j_AbsoluteScale",
            correction="Summer19UL16_V7_MC_AbsoluteScale_AK4PFchs",
            inputs=[],
        ),
        # ... several more uncertainties would follow
    ],
)
```

A tool (not in correctionlib proper) could add additional validation of these sets such that the names follow some pattern, as established by https://cms-analysis.docs.cern.ch/guidelines/uncertainty_digest/
A sketch is hidden below:
<details>
<summary>Click to expand code</summary>

```python
def validate_btag_uncertainty_set(input: UncertaintySet) -> None:
    parsed_name = input.name.split("_")
    match parsed_name:
        case ["btagDDBvLV2", working_point, calibration_method]:
            if working_point not in {"L", "M", "T"}:
                raise ValueError(f"Invalid working point: {working_point}")
            if calibration_method not in {"comb"}:
                raise ValueError(f"Invalid calibration method: {calibration_method}")
        case _:
            raise ValueError(f"Invalid btag uncertainty set name: {input.name}")


def validate_jes_uncertainty_set(input: UncertaintySet) -> None:
    parsed_name = input.name.split("_")
    match parsed_name:
        case ["JESFull", jet_type]:
            if jet_type not in {"AK4PFchs", "AK8PFchs", "AK4PFPuppi", "AK8PFPuppi"}:
                raise ValueError(f"Invalid jet type: {jet_type}")
        case _:
            raise ValueError(f"Invalid JES uncertainty set name: {input.name}")


def validate_uncertainty_set(input: UncertaintySet) -> None:
    set_class, *_ = input.name.split("_")
    match set_class:
        case s if s.startswith("btag"):
            validate_btag_uncertainty_set(input)
        case s if s.startswith("JES"):
            validate_jes_uncertainty_set(input)
        case _:
            raise ValueError(f"Unknown uncertainty set class: {set_class}")
```
</details>

There are a few design elements open still:
 - [ ] Define the nominal correction spec (similar structure as UncertaintySpec)
 - [ ] Define how a given uncertainty relate to the nominal correction: does it:
     - shift the nominal (additive)
     - scale the nominal (multiplicative)
     - replace the nominal (full alternative evaluation)
  - [ ] Design a binwise uncertainty spec (for e.g. statistical error, related to #286)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UncertaintySet extension #312

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UncertaintySet extension #312

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions