Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5044e74
Add config parameter to set external path to the sample database
moritzmolch Dec 22, 2025
4853d30
Add function to define new columns from expressions, provided in the …
moritzmolch Dec 22, 2025
8d62c4d
Add column definitions to sample preselection
moritzmolch Dec 22, 2025
d470ec6
Add 2022 and 2023 luminosity weights
moritzmolch Dec 22, 2025
eb8ff40
Add selection for lepton flavor in Run 3 DY samples
moritzmolch Dec 22, 2025
467b0da
Set path to datasets file according to sample database path in config.
moritzmolch Dec 22, 2025
392149a
Add weight to rescale tt contributions estimated from simulation
moritzmolch Dec 23, 2025
d0ee4dc
Introduce normalization weight for ttbar backgrounds
moritzmolch Jan 12, 2026
8113415
Use logger instead of print for echoing the sample database path
moritzmolch Jan 12, 2026
0d3d2a4
Remove obsolete variable
moritzmolch Jan 12, 2026
459a98e
Add documentation for column definitions entry in preselection step
moritzmolch Jan 12, 2026
80a40bc
Clean formatting of column_definitions table entry and extend descrip…
moritzmolch Jan 12, 2026
10b2192
Fix typo
moritzmolch Jan 12, 2026
898c3a4
Use column_definitions to redefine columns in boosted NMSSM analysis
moritzmolch Jan 12, 2026
351b1e0
Fix wrong logger call and synchronize setup of preselection_boosted w…
moritzmolch Jan 12, 2026
1e7ce8f
Move handling of column_definitions entry to the run_sample_preselect…
moritzmolch Jan 12, 2026
dad202d
Allow the user to specify a list processes with an exclusive list of …
moritzmolch Jan 15, 2026
42f76b8
Add flag that allows to use the Redefine function in column definitions
moritzmolch Jan 15, 2026
38f9cba
Add allow_redefine flag to column definitions for the boosted NMSSM c…
moritzmolch Jan 15, 2026
a5c9d39
Update documentation of column_definitions section
moritzmolch Jan 15, 2026
db29f33
Fix wrong description in define_columns docstring
moritzmolch Jan 15, 2026
3da8157
Remove obsolete column renaming function for boosted fake factors
moritzmolch Jan 16, 2026
b14ed9e
Use correct process list for evaluating processes to exclude.
moritzmolch Jan 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 89 additions & 1 deletion configs/nmssm_boosted/2018/preselection_et.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,94 @@ processes:
- "EGamma_Run2018C-UL2018"
- "EGamma_Run2018D-UL2018"

column_definitions:
njets:
expression: njets_boosted
allow_redefine: True
nbtag:
expression: nbtag_boosted
allow_redefine: True
metphi:
expression: metphi_boosted
allow_redefine: True
met:
expression: met_boosted
allow_redefine: True
pt_1:
expression: boosted_pt_1
allow_redefine: True
q_1:
expression: boosted_q_1
allow_redefine: True
pt_2:
expression: boosted_pt_2
allow_redefine: True
q_2:
expression: boosted_q_2
allow_redefine: True
mt_1:
expression: boosted_mt_1
allow_redefine: True
iso_1:
expression: boosted_iso_1
allow_redefine: True
mass_2:
expression: boosted_mass_2
allow_redefine: True
tau_decaymode_2:
expression: boosted_tau_decaymode_2
allow_redefine: True
deltaR_ditaupair:
expression: boosted_deltaR_ditaupair
allow_redefine: True
m_vis:
expression: boosted_m_vis
allow_redefine: True
fj_Xbb_pt:
expression: fj_Xbb_pt_boosted
allow_redefine: True
fj_Xbb_eta:
expression: fj_Xbb_eta_boosted
allow_redefine: True
fj_Xbb_particleNet_XbbvsQCD:
expression: fj_Xbb_particleNet_XbbvsQCD_boosted
allow_redefine: True
bpair_pt_1:
expression: bpair_pt_1_boosted
allow_redefine: True
bpair_pt_2:
expression: bpair_pt_2_boosted
allow_redefine: True
bpair_btag_value_2:
expression: bpair_btag_value_2_boosted
allow_redefine: True
bpair_eta_2:
expression: bpair_eta_2_boosted
allow_redefine: True
extraelec_veto:
expression: extraelec_veto_boosted
allow_redefine: True
gen_match_1:
expression: boosted_gen_match_1
allow_redefine: True
exclude_processes:
- data
gen_match_2:
expression: boosted_gen_match_2
allow_redefine: True
exclude_processes:
- data
btag_weight:
expression: btag_weight_boosted
allow_redefine: True
exclude_processes:
- data
pNet_Xbb_weight:
expression: pNet_Xbb_weight_boosted
allow_redefine: True
exclude_processes:
- data

event_selection:
# lep_pt: "boosted_pt_1 > 120"
had_tau_pt: "boosted_pt_2 > 40"
Expand Down Expand Up @@ -130,4 +218,4 @@ output_features:
- "bpair_eta_2"
- "met"
- "mass_2"
- "tau_decaymode_2"
- "tau_decaymode_2"
85 changes: 84 additions & 1 deletion configs/nmssm_boosted/2018/preselection_mt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,89 @@ processes:
- "SingleMuon_Run2018C-UL2018_GT36"
- "SingleMuon_Run2018D-UL2018_GT36"

column_definitions:
njets:
expression: njets_boosted
allow_redefine: True
nbtag:
expression: nbtag_boosted
allow_redefine: True
metphi:
expression: metphi_boosted
allow_redefine: True
met:
expression: met_boosted
allow_redefine: True
pt_1:
expression: boosted_pt_1
allow_redefine: True
q_1:
expression: boosted_q_1
allow_redefine: True
pt_2:
expression: boosted_pt_2
allow_redefine: True
q_2:
expression: boosted_q_2
allow_redefine: True
mt_1:
expression: boosted_mt_1
allow_redefine: True
iso_1:
expression: boosted_iso_1
allow_redefine: True
mass_2:
expression: boosted_mass_2
allow_redefine: True
tau_decaymode_2:
expression: boosted_tau_decaymode_2
allow_redefine: True
deltaR_ditaupair:
expression: boosted_deltaR_ditaupair
allow_redefine: True
m_vis:
expression: boosted_m_vis
allow_redefine: True
fj_Xbb_pt:
expression: fj_Xbb_pt_boosted
allow_redefine: True
fj_Xbb_eta:
expression: fj_Xbb_eta_boosted
allow_redefine: True
fj_Xbb_particleNet_XbbvsQCD:
expression: fj_Xbb_particleNet_XbbvsQCD_boosted
allow_redefine: True
bpair_pt_1:
expression: bpair_pt_1_boosted
allow_redefine: True
bpair_pt_2:
expression: bpair_pt_2_boosted
allow_redefine: True
bpair_btag_value_2:
expression: bpair_btag_value_2_boosted
allow_redefine: True
bpair_eta_2:
expression: bpair_eta_2_boosted
allow_redefine: True
extramuon_veto:
expression: extramuon_veto_boosted
allow_redefine: True
gen_match_2:
expression: boosted_gen_match_2
allow_redefine: True
exclude_processes:
- data
btag_weight:
expression: btag_weight_boosted
allow_redefine: True
exclude_processes:
- data
pNet_Xbb_weight:
expression: pNet_Xbb_weight_boosted
allow_redefine: True
exclude_processes:
- data

event_selection:
# lep_pt: "boosted_pt_1 > 55"
had_tau_pt: "boosted_pt_2 > 40"
Expand Down Expand Up @@ -130,4 +213,4 @@ output_features:
- "bpair_eta_2"
- "met"
- "mass_2"
- "tau_decaymode_2"
- "tau_decaymode_2"
91 changes: 90 additions & 1 deletion configs/nmssm_boosted/2018/preselection_tt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,95 @@ processes:
- "JetHT_Run2018C-UL2018"
- "JetHT_Run2018D-UL2018"

column_definitions:
njets:
expression: njets_boosted
allow_redefine: True
nbtag:
expression: nbtag_boosted
allow_redefine: True
metphi:
expression: metphi_boosted
allow_redefine: True
met:
expression: met_boosted
allow_redefine: True
pt_1:
expression: boosted_pt_1
allow_redefine: True
q_1:
expression: boosted_q_1
allow_redefine: True
pt_2:
expression: boosted_pt_2
allow_redefine: True
q_2:
expression: boosted_q_2
allow_redefine: True
mt_1:
expression: boosted_mt_1
allow_redefine: True
iso_1:
expression: boosted_iso_1
allow_redefine: True
mass_1:
expression: boosted_mass_1
allow_redefine: True
mass_2:
expression: boosted_mass_2
allow_redefine: True
tau_decaymode_1:
expression: boosted_tau_decaymode_1
allow_redefine: True
tau_decaymode_2:
expression: boosted_tau_decaymode_2
allow_redefine: True
deltaR_ditaupair:
expression: boosted_deltaR_ditaupair
allow_redefine: True
m_vis:
expression: boosted_m_vis
allow_redefine: True
fj_Xbb_pt:
expression: fj_Xbb_pt_boosted
allow_redefine: True
fj_Xbb_eta:
expression: fj_Xbb_eta_boosted
allow_redefine: True
fj_Xbb_particleNet_XbbvsQCD:
expression: fj_Xbb_particleNet_XbbvsQCD_boosted
allow_redefine: True
bpair_pt_1:
expression: bpair_pt_1_boosted
allow_redefine: True
bpair_pt_2:
expression: bpair_pt_2_boosted
allow_redefine: True
bpair_btag_value_2:
expression: bpair_btag_value_2_boosted
allow_redefine: True
bpair_eta_2:
expression: bpair_eta_2_boosted
allow_redefine: True
extramuon_veto:
expression: extramuon_veto_boosted
allow_redefine: True
gen_match_2:
expression: boosted_gen_match_2
allow_redefine: True
exclude_processes:
- data
btag_weight:
expression: btag_weight_boosted
allow_redefine: True
exclude_processes:
- data
pNet_Xbb_weight:
expression: pNet_Xbb_weight_boosted
allow_redefine: True
exclude_processes:
- data

event_selection:
# met: "(met_boosted > 120)"
had_tau_pt: "(boosted_pt_1 > 40) && (boosted_pt_2 > 40)"
Expand Down Expand Up @@ -132,4 +221,4 @@ output_features:
- "mass_1"
- "tau_decaymode_1"
- "mass_2"
- "tau_decaymode_2"
- "tau_decaymode_2"
39 changes: 38 additions & 1 deletion docs/preselection.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ The preselection config has the following parameters:
---|---|---
`channel` | `string` | tau pair decay channels ("et", "mt", "tt")
`processes` | `dict` | process parameters are explained below
`column_definitions` | `dict` | in this section, new columns can be defined based on a given `ROOT` expression. <br>The keys of the dictionary correspond to the name of the defined column. The values are dictionaries itself, with the `expression` key defining the `ROOT` expression for defining the column. Optional entries `processes` and `exclude_processes` allow to target specific processes, the entry `allow_redefine` can be used to enable the use of the `ROOT.RDataFrame.Redefine` function for overwriting already existing columns. For a more detailed description, see below.
`event_selection` | `dict` | with this parameter all selections that should be applied are defined. <br>This is basically a dictionary of cuts where the key is the name of a cut and the value is the cut itself as a string e.g. `had_tau_pt: "pt_2 > 30"`. The name of a cut is not really important, it is only used as an output information in the terminal. A cut can only use variables which are in the ntuples.
`mc_weights` | `dict` | weight parameter are defined below
`emb_weights` | `dict` | all weights that should be applied for embedded samples are defined. <br>Like for `event_selection` a weight can directly be specified and is then applied to all samples the same way e.g. `single_trigger: "trg_wgt_single_mu24ormu27"`
Expand All @@ -31,6 +32,42 @@ The `tau_gen_modes` have following modes:
`L` | `string` | lepton misidentified as a tau
`all` | `string` | if no split should be performed

In `column_definitions`, new columns can be added to the output `ntuples` by
using `ROOT` expression. An example entry could look like this:

```yaml
column_definitions:
nbtag:
expression: n_bjets
processes:
- ttbar
- DY
btag_weight:
expression: id_wgt_bjet_pnet_shape
exclude_processes:
- data
allow_redefine: True
jj_deltaR:
expression: ROOT::VecOps::DeltaR(jeta_1, jeta_2, jphi_1, jphi_2)
```

The key `expression` is required and can contain any valid `ROOT` expression.

The entry `exclude_processes` is optional. Column definitions are performed for
all processes except the ones given in this list. The entry `processes` is also
optional. The column definition is performed only for processes in this list.
The lists `processes` and `exclude_processes` can contain the names from the
`processes` section of this configuration. By default, the new columns are
defined for all processes. To write the new columns to the output file, you have
to explicitly add the columns to the `output_features` list. Note that you can
only set `processes` or `exclude_processes` for a column, but not both at the
same time.

If the key `allow_redefine` is set to `True`, the `ROOT.RDataFrame.Redefine`
function is used if a column with the same name has been found in the
`RDataFrame`. The values in this column are then overwritten by the expression
given for the new column.

In `mc_weights` all weights that should be applied for simulated samples are defined. <br>
There are two types of weights.

Expand All @@ -53,4 +90,4 @@ python preselection.py --config-file configs/PATH/CONFIG.yaml
Further there are additional optional parameters:

1. `--nthreads=SOME_INTEGER` to define the number of threads for the multiprocessing pool to run the sample processing in parallel. Default value is 8 (this should normally cover running all of the samples in parallel).
2. `--ncores=SOME_INTEGER` to define the number of cores that should be used for each pool thread to speed up the ROOT dataframe calculation. Default value is 2.
2. `--ncores=SOME_INTEGER` to define the number of cores that should be used for each pool thread to speed up the ROOT dataframe calculation. Default value is 2.
Loading