Skip to content

Add compare and ranking function for FEMaps#174

Open
jthorton wants to merge 4 commits intomainfrom
compair
Open

Add compare and ranking function for FEMaps#174
jthorton wants to merge 4 commits intomainfrom
compair

Conversation

@jthorton
Copy link
Contributor

Description

This PR adds a general compare and ranking function for a collection of FEMaps which contain results for the same edges which should be compared for any significant differences in performance. The FEMaps are compared by the user chosen metric, by default the mean unsigned error, via the distribution of differences of that metric under a joint bootstrapping procedure. A two-sided p-value is then determined via the fraction of negative or positive differences in the metric (whichever is smaller), a multitest correction scheme is then used to correct the p-values which are used to rank the FEMaps using the compact letter display (CLD) assigned via the insert-absorb method.

The result is two pandas dataframes, the first contains all evaluated metrics for each FEMap with confidence intervals and the CLD rank, the second contains the raw comparison data of the metric including p-values.

Example stats dataframe

    Model       MUE  MUE_CI_Lower  ...  KTAU_CI_Lower  KTAU_CI_Upper  CLD
0  FE Map 1  9.326389      9.055143  ...       0.407202       0.724801    a
1  FE Map 2  9.326389      9.062688  ...       0.377350       0.713957    a
2  FE Map 3  9.326389      8.987990  ...       0.044224       0.456656    b

Example comparsion

    Model 1   Model 2  Diff in rho  ...  p-value  significant  p-value corrected
0  FE Map 1  FE Map 2     0.002788  ...    0.666        False              0.666
1  FE Map 1  FE Map 3     0.321357  ...    0.002         True              0.006
2  FE Map 2  FE Map 3     0.318568  ...    0.002         True              0.006

Todos

Notable points that this PR has either accomplished or will accomplish.

  • add more tests
  • add a centralise/ shift method to the nodewise comparisons?

Questions

  • Question1

Checklist

  • Added a news entry for new features, bug fixes, or other user facing changes.

Status

  • Ready to go

Tips

  • Comment "pre-commit.ci autofix" to have pre-commit.ci atomically format your PR.
    Since this will create a commit, it is best to make this comment when you are finished with your work.

@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 88.29787% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.19%. Comparing base (26440da) to head (117aa88).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
cinnabar/compare.py 83.70% 22 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #174      +/-   ##
==========================================
+ Coverage   69.39%   72.19%   +2.80%     
==========================================
  Files          19       21       +2     
  Lines        1186     1356     +170     
==========================================
+ Hits          823      979     +156     
- Misses        363      377      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jthorton
Copy link
Contributor Author

pre-commit.ci autofix



def compare_and_rank_femaps(
femaps: list[FEMap],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be easier to have a single FEMap object that contains calc data from multiple experiments, using the "source" argument to distinguish between?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be easier as the current method requires you to add the same experimental info to all of the maps you are comparing, but we would need to rework the MLE calculation on the legacy graph to group by source first and then split into different edges so kind of doing this under the hood, for now it might be easier to ask users to do this? We could add something which copies the experimental values to all graphs if that would be easier?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that was the intent of the FEMap API - i.e. to have multiple sources so it's easier to do one big comparison rather than multiple objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants