Skip to content

UncertaintySet extension #312

@nsmith-

Description

@nsmith-

Putting here the proposal started by @lenzip for further discussion: we can define a new top-level field, alongside Correction and CompoundCorrection, called UncertaintySet. It would have a schema as follows:

import json
from pydantic import BaseModel, ConfigDict


class Model(BaseModel):
    model_config = ConfigDict(extra="forbid")


class FrozenInputSpec(Model):
    """A frozen set of input values to a correction, for use in uncertainty sources"""

    name: str
    value: str | int | float


class UncertaintySpec(Model):
    """An uncertainty source specification"""

    nuisance_name: str
    """Name of the nuisance parameter associated with this uncertainty source
    
    (i.e. combine name in datacard)
    """
    correction: str
    """The correction object (in this file) which defines this uncertainty source"""
    inputs: list[FrozenInputSpec]
    """Frozen input values to use in the correction evaluation"""


class UncertaintySet(Model):
    """A set of uncertainty sources"""

    name: str
    """Uncertainty set name"""
    uncertainties: list[UncertaintySpec]
    """List of uncertainty sources in this set"""

with examples such as:

# would be included in /cvmfs/cms-griddata.cern.ch/cat/metadata/BTV/Run2-2016postVFP-UL-NanoAODv9/latest/ak8_xbbcc_tagging.json.gz
btv_example = UncertaintySet(
    name="btagDDBvLV2_L_comb",
    uncertainties=[
        UncertaintySpec(
            nuisance_name="CMS_Xbbtag_comb_correlated",
            correction="btagDDBvLV2_comb",
            inputs=[
                FrozenInputSpec(name="working_point", value="L"),
                FrozenInputSpec(name="systematic", value="up_correlated"),
            ],
        ),
        UncertaintySpec(
            nuisance_name="CMS_Xbbtag_comb_uncorrelated_2016postVFP",
            correction="btagDDBvLV2_comb",
            inputs=[
                FrozenInputSpec(name="working_point", value="L"),
                FrozenInputSpec(name="systematic", value="up_uncorrelated"),
            ],
        ),
    ],
)

# would be included in /cvmfs/cms-griddata.cern.ch/cat/metadata/JME/Run2-2016postVFP-UL-NanoAODv9/latest/jet_jerc.json.gz
jme_example = UncertaintySet(
    name="JESFull_AK4PFchs",
    uncertainties=[
        UncertaintySpec(
            nuisance_name="CMS_scale_j_AbsoluteMPFBias",
            correction="Summer19UL16_V7_MC_AbsoluteMPFBias_AK4PFchs",
            inputs=[],
        ),
        UncertaintySpec(
            nuisance_name="CMS_scale_j_AbsoluteScale",
            correction="Summer19UL16_V7_MC_AbsoluteScale_AK4PFchs",
            inputs=[],
        ),
        # ... several more uncertainties would follow
    ],
)

A tool (not in correctionlib proper) could add additional validation of these sets such that the names follow some pattern, as established by https://cms-analysis.docs.cern.ch/guidelines/uncertainty_digest/
A sketch is hidden below:

Click to expand code
def validate_btag_uncertainty_set(input: UncertaintySet) -> None:
    parsed_name = input.name.split("_")
    match parsed_name:
        case ["btagDDBvLV2", working_point, calibration_method]:
            if working_point not in {"L", "M", "T"}:
                raise ValueError(f"Invalid working point: {working_point}")
            if calibration_method not in {"comb"}:
                raise ValueError(f"Invalid calibration method: {calibration_method}")
        case _:
            raise ValueError(f"Invalid btag uncertainty set name: {input.name}")


def validate_jes_uncertainty_set(input: UncertaintySet) -> None:
    parsed_name = input.name.split("_")
    match parsed_name:
        case ["JESFull", jet_type]:
            if jet_type not in {"AK4PFchs", "AK8PFchs", "AK4PFPuppi", "AK8PFPuppi"}:
                raise ValueError(f"Invalid jet type: {jet_type}")
        case _:
            raise ValueError(f"Invalid JES uncertainty set name: {input.name}")


def validate_uncertainty_set(input: UncertaintySet) -> None:
    set_class, *_ = input.name.split("_")
    match set_class:
        case s if s.startswith("btag"):
            validate_btag_uncertainty_set(input)
        case s if s.startswith("JES"):
            validate_jes_uncertainty_set(input)
        case _:
            raise ValueError(f"Unknown uncertainty set class: {set_class}")

There are a few design elements open still:

  • Define the nominal correction spec (similar structure as UncertaintySpec)
  • Define how a given uncertainty relate to the nominal correction: does it:
    • shift the nominal (additive)
    • scale the nominal (multiplicative)
    • replace the nominal (full alternative evaluation)
  • Design a binwise uncertainty spec (for e.g. statistical error, related to Access binning information #286)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestschemaIssues related to the schema definition

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions