Merging local development with cf 0.3 release #14

anigamova · 2025-06-01T15:59:06Z

No description provided.

Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com>

…ml_merging Remove ForestMerge from MergeMLStats

…ion_hists Option to skip selection hists

…order Flip stack plot order.

* module to calculate sceta for photon (for pre-v14 nanoAOD) * adopt naming scheme from nanoAOD v14 * implemented photon calibrations for energy scale and resolution * abstracted egamma energy scale uncertainties * abstracted egamma resolution corrections * add convenience function for electron supercluster eta * add correction modules for electrons * bug fix: central scale for resolution smearing is 1, not 0 * abstract object-level deterministic seeds, include seeds for electrons and photons * implement resolution smearing with deterministic seeds * fix linting * bug fix: fix calculation of arctan per quadrant (verified with nanoAOD v14) * stated type annotations for variables * started unit tests for egamma producers * test data for supercluster eta producer * first test for photon sceta * linting * first comparison * make with_uncertainties classproperty * linting * tests for initialization * remove type hints, fix compatibility issues * linting * bug fix: fix typo in produces for electron_sceta * remove producer for photon supercluster eta for now * apply code review --------- Co-authored-by: Philip Daniel Keicher <philip.daniel.keicher@cern.ch> Co-authored-by: Philip Keicher <philip.keicher@cern.ch>

…rams correct path only_missing looks for in MergeHistograms

@riga

* reformat doc strings of InferenceModel class * include HistHook mixin into inference task * added tests for inference modules * remove debugger * fix unit tests for inference models * fix linting * remove empty __all__ lists for automatic sphinx docs * Update columnflow/tasks/cms/inference.py * set __all__ attribute to empty to supress imports via * * ignore __all__ attribute for automodule functionality * add imported members to display all available types * switch to automodule functionality to be in line with other pages * Apply suggestions from code review (@riga) Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> --------- Co-authored-by: Philip Daniel Keicher <philip.daniel.keicher@cern.ch> Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

Style configs hard coded in the tasks were given priority over command line specified style configs (defined via the custom style config groups). Reversed this situation to give more control via command line.

…ority prioritise custom style config via command line

* update lepton SFs producers such that json files can also be used as inputs for the correctionlib * add review comments * Externalize correction set loading. * Typo. --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

* add correct keys for Efficiencies correction sets for electrons * Review comment. --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

…arting a PR) (columnflow#623)

@riga

…low#594) * Util functions and generic producers for delta-R matching. * CMS NanoAOD-specific delta-R matching producers. * Add `matching` modules to analysis template. * Fix return value for `delta_r_match_multiple`. * Add docstring for `delta_r_matcher`. Remove leftover debugger. * Review comments (@riga): add type hints, minor cleanup. * Review comments (@riga): put `src` before `dst` in matching functions. --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

* correct eta calculation trigger sf * modify name class variable in electron weight producer * apply comments from review

* The st_tchannel_t_powheg dataset was renamed to st_tchannel_t_4f_powheg in cmsdb * Add missing requirements for {Jet,Muon}.{eta,phi} ("coffea issue") * Update the JSONPOG mirror tag to a working version Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

* linting in cms_minimal template * fix example and ml_tf sandbox * fix muon Producer with behaviour attached * add producer example using attach_coffea_behaviour * Update analysis_templates/cms_minimal/__cf_module_name__/plotting/example.py --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

Co-authored-by: juvanden <jules.vandenbroeck@cern.ch>

* docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>

* docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

…flow#611)

* initial commit, fix to jer application to jec variations * change smearing factor to a callable function and calculate smearing factor for each jec variation. * update jets and mets definitions to ensure deep copy of original event array is taken * add jec-specfic columns to uses * Vectorized jer application over jec variations (columnflow#92) * Simplify jer init. * Overhaul vectorized jer processing. * Minor sources fix in jec. * move jec_variations, jer_variations, and postfixes to jer_init. Also include jec_ prefix to jec_variations as jec_variations is only used for registering uses and produces and storing jer variations in a dictionary. * change jer_random_normal variable name to random_normal --------- Co-authored-by: juvanden <jules.vandenbroeck@cern.ch> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

@mafrahm

* Fix task inehritance, adjust store parts. * Typo. * Revert stray changes. * Add store_part_anchor. * Re-purpose store part anchor for config part only. * Define config_store_anchor on ConfigTask for subclasses. * Fix inheritance order in datacard task. * TAF init refactoring draft. * Adapt template analysis. * Add comment. * Add review comments by @mafrahm. * Start. * Minor cleanup. * Port ConfigTask and ShiftTask. * Propagate ConfigTask changes to mixins and other tasks. * Update inference interface and tasks for multi config inputs. * Update hist hook handling. * Fix hist hook lookup. * Typo. * Update docstring. * Add union and intersection modes to default resolution. * Overhaul find_config_objects. * Update columnflow/tasks/framework/base.py Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> * Port config lookup to variables, review comment. * Typo. * Use dicts. * Update readme, fix typo in config task. * Update DatasetsProcessesMixin and ShiftSourcesMixin. * Improve loop over configs in shift validation. * Merge MultiConfigPlotting into refactor/taf_init (columnflow#630) * implement MultiConfigTask * disable TaskArrayFunction init when there is not config inst * fix MLEvaluation reqs * update template and CSPM parameter description * fix tests * add warning when cspm defaults are set in config_inst * tmp * fixes of PlotShiftMixin task * fixes in PlotShiftMixin * modify PlotVariables1d run method for multi config * reintroduce MLModelsMixin to plotting * add analysis_inst instead of config_inst in preparation_producer * resolve processes and variables per config * add PlotVariablesPerConfig wrapper tasks * split between DatasetsProcessesMixin and MultiConfigDatasetsProcessesMixin + cleanup * make ShiftSourcesMixin work with multiple configs * add PlotShiftedVariablesPerConfig1D mixin * fix bug when dataset is missing in first config * fix mixins from CreateDatacards * add default config to MultiConfigTask * move defaults to analysis_inst in AnalysisTask * move process and variable settings back to config inst * fixes to the two previous commits... * move multi config resolving function to AnalysisTask * review comments * set weight_producer in analysis inst in template * fix lint in template * decouple ShiftTask and ConfigTask and remove PlotShiftMixin * remove checking shifts for all reqs * move default categories, variables, and inference model to analysis inst * cleanup in VariablesMixin * remove config_inst from get_known_shifts signature * resolve shifts per config * store shift and category names instead of ids in histograms * fix hist tests * fix shifted variable plot func * handle missing shift bins in plot_shifted_variables * fix missing shift bins in plotting task * fill category as int and transform to str later * add growth to translated axis * fix and extend hist_util tests * loop over variables when switching to strcat * same for cutflow (+lint) * cleanup * fix resolving of ml model insts * fix process order in plotting * fix HistogramsUser task class inheritance * allow resolving selector step defaults * move selector_steps default to analysis_inst * fix bug in obtaining unique category ids * fir MRO in CreateHistograms * Update columnflow/tasks/plotting.py Co-authored-by: Philip Keicher <26219567+pkausw@users.noreply.github.com> * feature/MultiConfigPlotting * cleanup and reintroduce ML mixins from MultiConfigPlotting * fix murmuf_envelope Producer * remove config_inst from get_known_shifts signature * fix WeightProducerClassMixin inheritance and add MultiConfigDatasetsProcessesShiftSourcesMixin * decouple ShiftTask from ConfigTask * bugfixes and linting * fix PlotShiftedVariables * centralize definition of CSPW representations * fixes in ML tasks * fix inconsistencies after merging * add tests for default and group resolving * remove single_config tag from VariablesMixin * move CSPM groups to analysis inst in template * fix category/variable resolving and add resolving tests * streamline tests * cleanup and fix param resolving * add tests for process resolving * extend resolving tests and fix dataset/process resolving * remove duplicate lines * include shift inst tests --------- Co-authored-by: localusers user <juvanden@m0.iihe.ac.be> Co-authored-by: Philip Keicher <26219567+pkausw@users.noreply.github.com> * Refactor/taf init simplified shift validation (columnflow#641) * revert changes to ShiftSourcesMixin * simplify shift resolving as much as possible * streamline resolve_shifts function * Refactor/taf init (reorganized resolution + fix ml pipeline) (columnflow#643) * first draft for reordered TAF initialization and param resolution * remove shift bins that were not requested from branch map * fix single config tasks (yields) and cleanup * reintroduce ML training pipeline * switch to DatasetsProcessesMixin * recreate dependencies in run_post_init * Cleanup * perform shift resolution only if not yet done * fix single config datasets/processes resolving * move DatasetsProcessesMixin to PlotProcessBase and fallback to nominal shift in reqs * move logger messages into debug mode * revert 5c515bb * fallback branch to -1 if not existent * move default CSPs back to cofig inst * fix param resolving in wrapper_factory * update resolution class and function names * fix bug (pass shift name instead of inst in reqs) * Apply suggestions from code review Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * add resolve_instances for InferenceModelMixin * minor refactoring --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Improve known_shifts caching between workflow and branches. * Fixes edge cases. * Fix default resolution. * Refactor/taf init fixes (columnflow#645) * add missing MLEvaluation reqs * add producer_inst to ProduceColumns.reqs in ML pipeline * load ML columns in histograms and union tasks * locate shift name instead of id in histograms * Typo. * Adjust inference model tests. (columnflow#646) * Fix TAF post init order (columnflow#647) * Correct taf post init order. * Fix selector steps default. * Fix typo. * Add reducer interface. (columnflow#648) * Add reducer interface. * Additional reducer fallback to cf_default. * Add hist prodcer interface. (columnflow#650) * Cleanup top pt weight producer. (columnflow#625) * Cleanup top pt weight producer. * Add TopPtWeightConfig. * Update columnflow/production/cms/top_pt_weight.py Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> --------- Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> * Documentation update for refactoring (columnflow#652) * Start docs update. * Update README. * Add TAF docs. * Finish TAF docs, start transition. * Finish tafs in transition guide. * Finish changed task names docs. * Add multi-config update instructions. * Finish transition guide for reducers. * Finish inference model transition docs. * Finish transition docs. * Lint. * Systematic shift plotting (columnflow#649) * Update shift plots. * Fix id/name handling. * Address review comments by @mafrahm. * Update variable names, add comments. * Update sandboxes. * Update law. * Code harmonization * Apply review comment. * Use process names in hist axes. (columnflow#657) * Use process names in hist axes. * Apply axes conversion to remaining spots. * Add configurable string representations. * Add missing docstring. * Optimize hist filling, code alignment. * Feature: Add mechanism to transform hist into version with equally spaced b ins (columnflow#627) * added mechanism to transform hist into version with equally spaced bins, also added keyword to rotate xticks label * linter * added forgotten keyword argument in the preration of the config * correct typo, add new arguments to kwargs and change default x_ticks * Refactor rebinning function. * Simplify axis settings. * Feedback process and variable updates to style config. * Move x axis transformations to 'apply_variable_settings'. --------- Co-authored-by: Nathan Prouvost <nathan.prouvost@gmail.com> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Marcel R. <github.riga@icloud.com> * Add and use only_local_env decorator. * Make lumi in normalization weight producer configurable. * Minor fixes and consistency. * Fix config lookup for taf classes mixins. (columnflow#669) * CMS jet id producer (columnflow#661) * Add cms-related jet id producer. * Fix bit check. * Allow subpaths in external files. (columnflow#663) * Allow subpaths in external files. * Minor de-nesting. * Maintain subpaths type. * Eager taf teardown when call function fails. (columnflow#662) * Eager taf teardown when call function fails. * Gracefully trigger teardown via decorator. * minor fixes and streamlining (columnflow#671) * Unambiguous hashing. * fix plotting with single varied shift (columnflow#672) * remove flag from MergeHistograms * fix plotting single varied shift in PlotVariables1D * Update columnflow/tasks/histograms.py --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Update law. * adding dy weights producer (columnflow#622) * adding dy weights producer * redifining masks and adding dy_weights_init * adding dy_order input * adding order to DrellYanConfig * adding order to DrellYanConfig * adding check for dy order in cfg * add missing self.dy_unc_corrector * update dy weight producer * linting dy recoil producer * remove duplicate entry in dy recoil weights * fix logic in DY recoil vis dilepton selection * format with black * passed flake8 * linting * update recoil corrections by removing helper functions * fix linter * fix bug with import InsertableDict * Apply suggestions from code review Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * add suggestions from review to DY producer --------- Co-authored-by: Paul Philipp Gadow <paul.philipp.gadow@cern.ch> Co-authored-by: philippgadow <philipp.gadow@mytum.de> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * fix PlotCutflow task and requirements * Update columnflow/tasks/selection.py Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Shift-conform dy outputs. * fix ml_model repr * Rename recoil_corrections to recoil_corrected_met. * Apply new recommendation for egamma calibration (columnflow#674) * added more kwargs for config, that are necessary to handle run2 and run3 recommendation at the same time. * added more variables for variable maps, switched application of smearing to a standardized version, that results in the same result but is more robust. * removed comments * removed version check * change rand_func to separate normal_up, down variant * rewrap docstring and point to EGammaPog recommendation and example file * switched to concrete arguments in config and feedforward this change * add example into docstring about how to use the calibrator in combination with the config * Apply suggestions from code review --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * append scale label when not passing placeholder * implement own errorbar calculation (columnflow#675) * implement own errorbar calculation * make poisson error calculation independent of histogram shape * Apply suggestions from code review Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * change function name * Apply comments from review Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> * Fix flow handling for fake data in datacards. * Allow skipping histogram checks. * Fix used columns of btag weight producer. * minor plotting fixes * Fix norm_weight_producer_inst in MergeSelectionMasks. * Improve transition guide. * parton shower weights (columnflow#676) * init commit. To see commit history check scalefactor-development branch in GhentAnalysis fork * remove the cmsGhent folder and add parton_shower.py to production/cms * Update columnflow/production/cms/parton_shower.py Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Update columnflow/production/cms/parton_shower.py Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * add parton_shower to columnflow-cms specific production modules --------- Co-authored-by: juvanden <jules.vandenbroeck@cern.ch> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Minor alignment. * Minor cleanup of electron code. * Fix typos in egamma calibrators. * Enable jet_id producer for data. * Hotfix ps weights when variations are missing. * Add cf_remove_tmp tool. * fix typo. * Fix shift selection for plotting. * Fixes for docs (pdf figures not displayed) (columnflow#679) * docs: evince-previewer -> evince evince-previewer is the print preview of evince * added filter to upload svg files to lfs * docs: converted all pdf plots to (additional) svg using `for f in *.pdf; do pdf2svg $f ${f%.pdf}.svg; done` uploaded to lfs * docs: using wildcard extensions for plot file names such that the html generation uses svg, while others (e.g. latex) may still use pdf Before, the image display in the browser was broken and only a link to the pdf file was shown (supposedly the alt text). * Rename histograming -> histogramming. (columnflow#680) * Add missing local_env check to BundleExternalFiles task. * allow diverging producers in MLEvaluation (columnflow#681) * hotfix: update producer_insts based on evaluation_producers * hotfix: update hists with remove_residual_axis function * allow passing mask to apply JER smearing only to a subset of jets * update faulty import in cms_minimal template * hotfix: allow running ml pipeline without preparation_producer * fix padding when ak.max returns None * update jer_horn_handling calibrator * cast undefined_category_ids to str before raising the error to avoid TypeError * Improve tmp file removal. * Update law. * Add preparation producer post init. * Avoid full config copy in plotting. * More verbose leaf category check exceptions. * set scale_factor to 1 instead of 0 (columnflow#685) * allow skipping preparation_producer in MLEvaluation (columnflow#686) Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Save lepton pair pdg id in gen_dilepton producer. * Add structure for category groups. * Add warning. * Typo. * Add warn flag to CategoryGroup. --------- Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> Co-authored-by: localusers user <juvanden@m0.iihe.ac.be> Co-authored-by: Philip Keicher <26219567+pkausw@users.noreply.github.com> Co-authored-by: Bogdan-Wiederspan <79155113+Bogdan-Wiederspan@users.noreply.github.com> Co-authored-by: Nathan Prouvost <nathan.prouvost@gmail.com> Co-authored-by: Ana Andrade <99343616+aalvesan@users.noreply.github.com> Co-authored-by: Paul Philipp Gadow <paul.philipp.gadow@cern.ch> Co-authored-by: philippgadow <philipp.gadow@mytum.de> Co-authored-by: Mathis Frahm <mathisfrahm@gmx.de> Co-authored-by: Nathan Prouvost <49162277+nprouvost@users.noreply.github.com> Co-authored-by: JulesVandenbroeck <93740577+JulesVandenbroeck@users.noreply.github.com> Co-authored-by: juvanden <jules.vandenbroeck@cern.ch> Co-authored-by: Johannes Lange <jolange@users.noreply.github.com> Co-authored-by: Philip Daniel Keicher <philip.daniel.keicher@cern.ch>

…ts_file_ext correct json file extension

Anigamova cf0p3 weights

Added support for specifying axis grid

riga and others added 30 commits January 20, 2025 08:28

Add option to skip selection hists.

da7e410

Flip stack plot order.

9a6e6d6

Update flag in law.cfg.

6230f3d

Remove forest merge.

60545a0

Revert selection.

1c62af3

Merge branch 'master' into feature/update_sel_ml_merging

4fbe585

Merge branch 'master' into feature/flip_stack_order

de895f3

Merge branch 'master' into feature/skip_selection_hists

12b5d0a

Apply suggestions from code review

1e50af1

Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com>

Change default.

13af3ac

Merge pull request columnflow#606 from columnflow/feature/update_sel_…

1929f81

…ml_merging Remove ForestMerge from MergeMLStats

Merge branch 'master' into feature/skip_selection_hists

5b5bdf8

Merge pull request columnflow#608 from columnflow/feature/skip_select…

699fad0

…ion_hists Option to skip selection hists

Merge branch 'master' into feature/flip_stack_order

8e07f74

Remove flip_stack option.

7443290

Hotfix notifications from sandboxed tasks.

e521901

Merge branch 'master' into feature/flip_stack_order

98ddd50

Merge pull request columnflow#607 from columnflow/feature/flip_stack_…

841ff7d

…order Flip stack plot order.

hotfix: make dataset_inst in init_func optional for egamma modules

ecb6aec

hotfix: separate the dataset_inst checks

a730e2c

correct path only_missing looks for in MergeHistograms

20c98ab

Merge pull request columnflow#613 from columnflow/bugfix_merge_histog…

5fee9e4

…rams correct path only_missing looks for in MergeHistograms

Hotfix: properly forward and use deps_kwargs. (columnflow#614)

d9d8cff

Hotfix rounding error in datacard for fake data generation.

ca673f0

Add missing hist hook repr to datacard paths.

afdcfb1

fix bug where ReduceEvents workflow is submitted twice

6e40750

prioritise custom style config via command line

4b819ba

Style configs hard coded in the tasks were given priority over command line specified style configs (defined via the custom style config groups). Reversed this situation to give more control via command line.

Merge pull request columnflow#617 from GhentAnalysis/style_config_pri…

e91e148

…ority prioritise custom style config via command line

riga and others added 29 commits February 10, 2025 13:28

Hotfix use of fake data in inference model.

ed7220c

Update input files lepton SFs (columnflow#615)

0cbd2fb

* update lepton SFs producers such that json files can also be used as inputs for the correctionlib * add review comments * Externalize correction set loading. * Typo. --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

Typo.

e19c23c

Hotfix weight producer info to CreateDatacards.

8cc9220

Correct input keys for electron Efficiencies (columnflow#620)

d8e3b30

* add correct keys for Efficiencies correction sets for electrons * Review comment. --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>

update sf variations names (or why you should test the code before st…

e18f1df

…arting a PR) (columnflow#623)

Fix lumi label precision to recommended digits.

da70a68

correct eta calculation trigger sf (columnflow#632)

4367c55

* correct eta calculation trigger sf * modify name class variable in electron weight producer * apply comments from review

Fix pu weight extraction. (columnflow#654)

0d48d70

Update law.

3b970bc

typo on inference task (columnflow#655)

ce79258

Co-authored-by: juvanden <jules.vandenbroeck@cern.ch>

docs: add aalvesan as a contributor for code (columnflow#659)

d97e494

* docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>

add category uniqueness check in create_category_combinations (column…

1bd7fca

…flow#611)

Hotfix category id uniqueness check.

384da20

Add tmp dir checks, add cf_setup_post_install hook.

578d8b7

correct json file extension

db5f46b

Hotfix category flattening.

54191a5

Merge pull request columnflow#691 from GhentAnalysis/upstream/fix_sta…

c21f2c2

…ts_file_ext correct json file extension

normalize to the number of events, the weights are set to 1

c542511

high pileup weights

051fbe6

Merge pull request #16 from DesyTau/anigamova_cf0p3_weights

db9c146

Anigamova cf0p3 weights

Added support for specifying axis grid

ea29d08

Merge pull request #19 from hephysicist/st_feature_grid

34dd77c

Added support for specifying axis grid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merging local development with cf 0.3 release #14

Merging local development with cf 0.3 release #14

anigamova commented Jun 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Merging local development with cf 0.3 release #14

Are you sure you want to change the base?

Merging local development with cf 0.3 release #14

Conversation

anigamova commented Jun 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants