-
Notifications
You must be signed in to change notification settings - Fork 17
Update to handle multiDim histograms #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for multi-dimensional (2D and 3D) histograms to the FLAF analysis framework and fixes a memory leak in the histogram merger. The changes enable linearization of 2D histograms into 1D representations with custom binning per y-bin range.
Changes:
- Added C++ function
rebinHistogramDictto rebin multi-dimensional histograms with custom binning per y-range - Extended Python histogram handling to support 1D, 2D, and 3D histograms throughout the analysis pipeline
- Fixed memory leak in histogram merger by setting
SetDirectory(0)on histograms during accumulation
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| include/HistHelper.h | Adds C++ rebinHistogramDict function for multi-dimensional histogram rebinning with custom y-range binning |
| Common/HistHelper.py | Extends RebinHisto and GetModel to support dict-based binning and multi-dimensional histograms; adds cache helper functions |
| Analysis/tasks.py | Adds variable flattening logic to extract individual variables from multi-dimensional variable definitions |
| Analysis/HistTupleProducer.py | Refactors to handle flattened variables and adds shifted tree processing for systematic uncertainties |
| Analysis/HistProducerFromNTuple.py | Updates bin counting logic to correctly handle 2D/3D histograms and adds dimension detection |
| Analysis/HistPlotter.py | Removes duplicate RebinHisto and getNewBins functions (now in HistHelper.py); adds 2D binning support |
| Analysis/HistMergerFromHists.py | Fixes memory leak by calling SetDirectory(0) when accumulating histograms and refactors file handling |
| all_output_edges.erase(std::unique(all_output_edges.begin(), all_output_edges.end()), all_output_edges.end()); | ||
|
|
||
| // Create output histogram with variable binning | ||
| TH1D* hist_output = new TH1D("rebinned", "rebinned", all_output_edges.size() - 1, all_output_edges.data()); |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The histogram name and title are hardcoded as 'rebinned'. Consider accepting these as parameters or deriving them from the input histogram to make the function more flexible and preserve histogram metadata.
| TH1D* hist_output = new TH1D("rebinned", "rebinned", all_output_edges.size() - 1, all_output_edges.data()); | |
| TH1D* hist_output = new TH1D(hist_initial->GetName(), hist_initial->GetTitle(), all_output_edges.size() - 1, all_output_edges.data()); |
|
|
||
| TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins, | ||
| const std::vector<std::pair<float, float>>& y_bin_ranges, | ||
| const std::vector<std::vector<float>>& output_bin_edges) { |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function creates a new histogram with new but there's no clear ownership or deletion mechanism documented. Consider documenting that the caller is responsible for deleting the returned histogram, or use ROOT's memory management (e.g., adding the histogram to a directory).
| TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins, | |
| const std::vector<std::pair<float, float>>& y_bin_ranges, | |
| const std::vector<std::vector<float>>& output_bin_edges) { | |
| /** | |
| * Rebin a 1D histogram using a dictionary of variable-width bin edges. | |
| * | |
| * The returned histogram is allocated with `new` and ownership is | |
| * transferred to the caller. The caller is responsible for managing | |
| * its lifetime, e.g. by: | |
| * - explicitly deleting it with `delete`, or | |
| * - adding it to a ROOT TDirectory/TFile so that ROOT manages | |
| * its deletion. | |
| */ | |
| TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins, | |
| const std::vector<std::pair<float, float>>& y_bin_ranges, | |
| const std::vector<std::vector<float>>& output_bin_edges) { |
|
@cms-flaf-bot please test
|
|
pipeline#13793040 started |
|
pipeline#13793040 failed |
|
@cms-flaf-bot please test
|
|
pipeline#13793221 started |
|
I hate merge conflicts and apparently broke everything when I was trying to fix them 😢 |
|
pipeline#13793221 failed |
76b6877 to
0dddc08
Compare
|
@cms-flaf-bot please test
|
|
pipeline#13793622 started |
|
@cms-flaf-bot please test
|
|
pipeline#13793706 started |
|
pipeline#13793706 failed |
|
@cms-flaf-bot please test
|
|
pipeline#13793934 started |
|
pipeline#13793934 failed |
|
pipeline#13793622 failed |
|
@cms-flaf-bot please test
|
|
pipeline#13794271 started |
|
pipeline#13794271 failed |
|
@cms-flaf-bot please test
|
|
pipeline#13795444 started |
|
pipeline#13795444 passed |
|
@cms-flaf-bot please test
|
|
pipeline#13796018 started |
|
pipeline#13796018 failed |
|
Testing has to wait until cms-flaf/HH_bbWW#53 is merged |
|
@cms-flaf-bot please test
|
|
pipeline#13799766 started |
|
pipeline#13799766 failed |
|
@cms-flaf-bot please test
|
|
pipeline#13803335 started |
|
pipeline#13803335 passed |
|
@cms-flaf-bot please test
|
|
pipeline#13803427 started |
|
The formatting failure is dumb. The lxplus |
|
pipeline#13803427 passed |
|
@cms-flaf-bot please test
|
|
pipeline#13803678 started |
|
pipeline#13803678 passed |
| if dims == 1: | ||
| unit_hist = rdf.Filter(filter_to_apply).Histo1D( | ||
| unit_bin_model, *var_bin_list, weight_name | ||
| ) | ||
| elif dims == 2: | ||
| unit_hist = rdf.Filter(filter_to_apply).Histo2D( | ||
| unit_bin_model, *var_bin_list, weight_name | ||
| ) | ||
| elif dims == 3: | ||
| unit_hist = rdf.Filter(filter_to_apply).Histo3D( | ||
| unit_bin_model, *var_bin_list, weight_name | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if dims == 1: | |
| unit_hist = rdf.Filter(filter_to_apply).Histo1D( | |
| unit_bin_model, *var_bin_list, weight_name | |
| ) | |
| elif dims == 2: | |
| unit_hist = rdf.Filter(filter_to_apply).Histo2D( | |
| unit_bin_model, *var_bin_list, weight_name | |
| ) | |
| elif dims == 3: | |
| unit_hist = rdf.Filter(filter_to_apply).Histo3D( | |
| unit_bin_model, *var_bin_list, weight_name | |
| ) | |
| rdf_filtered = rdf.Filter(filter_to_apply) | |
| if dims >= 1 and dims <= 3: | |
| mkhist_fn = getattr(rdf_filtered, f'Histo{dims}D') | |
| unit_hist = mkhist_fn(unit_bin_model, *var_bin_list, weight_name) | |
| # Yes I know this is ugly | ||
| # Axis needs +2 for under/overflow, but only if the return is not 1!!! | ||
| # Was having issue with Z axis in 2D. We don't want to multiply by 3 if it's 2D | ||
| N_xbins = unit_hist.GetNbinsX() + 2 | ||
| N_ybins = unit_hist.GetNbinsY() if hasattr(unit_hist, "GetNbinsY") else 1 | ||
| N_ybins = N_ybins + 2 if N_ybins > 1 else N_ybins | ||
| N_zbins = unit_hist.GetNbinsZ() if hasattr(unit_hist, "GetNbinsZ") else 1 | ||
| N_zbins = N_zbins + 2 if N_zbins > 1 else N_zbins | ||
| N_bins = N_xbins * N_ybins * N_zbins | ||
| # If we use the THnD then we have 'GetNbins' function instead | ||
| N_bins = unit_hist.GetNbins() if hasattr(unit_hist, "GetNbins") else N_bins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Yes I know this is ugly | |
| # Axis needs +2 for under/overflow, but only if the return is not 1!!! | |
| # Was having issue with Z axis in 2D. We don't want to multiply by 3 if it's 2D | |
| N_xbins = unit_hist.GetNbinsX() + 2 | |
| N_ybins = unit_hist.GetNbinsY() if hasattr(unit_hist, "GetNbinsY") else 1 | |
| N_ybins = N_ybins + 2 if N_ybins > 1 else N_ybins | |
| N_zbins = unit_hist.GetNbinsZ() if hasattr(unit_hist, "GetNbinsZ") else 1 | |
| N_zbins = N_zbins + 2 if N_zbins > 1 else N_zbins | |
| N_bins = N_xbins * N_ybins * N_zbins | |
| # If we use the THnD then we have 'GetNbins' function instead | |
| N_bins = unit_hist.GetNbins() if hasattr(unit_hist, "GetNbins") else N_bins | |
| N_bins = unit_hist.GetNbins() if hasattr(unit_hist, "GetNbins") else unit_hist.GetNcells() |
| # Create ROOT vectors | ||
| # Call the C++ function which returns a new histogram | ||
| new_hist = ROOT.analysis.rebinHistogramDict( | ||
| hist_initial, N_bins, y_bin_ranges, output_bin_edges_vec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| hist_initial, N_bins, y_bin_ranges, output_bin_edges_vec | |
| hist_initial, y_bin_ranges, output_bin_edges_vec |
| N_xbins = hist_initial.GetNbinsX() + 2 | ||
| N_ybins = hist_initial.GetNbinsY() if hasattr(hist_initial, "GetNbinsY") else 1 | ||
| N_ybins = N_ybins + 2 if N_ybins > 1 else N_ybins | ||
| N_zbins = hist_initial.GetNbinsZ() if hasattr(hist_initial, "GetNbinsZ") else 1 | ||
| N_zbins = N_zbins + 2 if N_zbins > 1 else N_zbins | ||
| N_bins = N_xbins * N_ybins * N_zbins | ||
| # If we use the THnD then we have 'GetNbins' function instead | ||
| N_bins = ( | ||
| hist_initial.GetNbins() if hasattr(hist_initial, "GetNbins") else N_bins | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| N_xbins = hist_initial.GetNbinsX() + 2 | |
| N_ybins = hist_initial.GetNbinsY() if hasattr(hist_initial, "GetNbinsY") else 1 | |
| N_ybins = N_ybins + 2 if N_ybins > 1 else N_ybins | |
| N_zbins = hist_initial.GetNbinsZ() if hasattr(hist_initial, "GetNbinsZ") else 1 | |
| N_zbins = N_zbins + 2 if N_zbins > 1 else N_zbins | |
| N_bins = N_xbins * N_ybins * N_zbins | |
| # If we use the THnD then we have 'GetNbins' function instead | |
| N_bins = ( | |
| hist_initial.GetNbins() if hasattr(hist_initial, "GetNbins") else N_bins | |
| ) |
| unit_bin_model = ROOT.RDF.TH1DModel( | ||
| "", "", model.fNbinsX, -0.5, model.fNbinsX - 0.5 | ||
| ) | ||
| print("nD histogram not implemented yet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| print("nD histogram not implemented yet") | |
| raise RuntimeError("nD histogram not implemented yet") |
| unit_bin_Inputs = [model.fNbinsX, -0.5, model.fNbinsX - 0.5] | ||
| unit_bin_model = ROOT.RDF.TH1DModel("", "", *unit_bin_Inputs) | ||
|
|
||
| elif dims == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code for 2D and 3D can be the same.
| }; | ||
|
|
||
|
|
||
| TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins, | |
| TH1D rebinHistogramDict(const TH1& hist_initial, int N_bins, |
| step = (out_max - out_min) / n_out_bins | ||
| for i in range(n_out_bins + 1): | ||
| out_edges.push_back(out_min + i * step) | ||
| output_bin_edges_vec.push_back(out_edges) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise an error if edges do not start from 0
| std::vector<float> all_output_edges; | ||
| float last_edge = 0.0; | ||
| for (const auto& edges : output_bin_edges) { | ||
| for (float edge : edges) { | ||
| all_output_edges.push_back(edge + last_edge); | ||
| } | ||
| last_edge = all_output_edges.back(); | ||
| } | ||
| // Sort and remove duplicates | ||
| std::sort(all_output_edges.begin(), all_output_edges.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch to bin numbering
| // Set bin content and error | ||
| if (global_bin >= 1 && global_bin <= (int)all_output_edges.size() - 1) { | ||
| hist_output->SetBinContent(global_bin, hist_output->GetBinContent(global_bin) + bin_content); | ||
| hist_output->SetBinError(global_bin, std::sqrt(std::pow(hist_output->GetBinError(global_bin), 2) + bin_error2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Don't do sqrt & square for each iteration. first accumulate err2 and then set all at the end.
Add 2D linearizing, multi-dim histograms, and fixed memory leak in histmerger