Add compare and ranking function for FEMaps by jthorton · Pull Request #174 · OpenFreeEnergy/cinnabar

jthorton · 2026-01-16T10:50:33Z

Description

This PR adds a general compare and ranking function for a collection of FEMaps which contain results for the same edges which should be compared for any significant differences in performance. The FEMaps are compared by the user chosen metric, by default the mean unsigned error, via the distribution of differences of that metric under a joint bootstrapping procedure. A two-sided p-value is then determined via the fraction of negative or positive differences in the metric (whichever is smaller), a multitest correction scheme is then used to correct the p-values which are used to rank the FEMaps using the compact letter display (CLD) assigned via the insert-absorb method.

The result is two pandas dataframes, the first contains all evaluated metrics for each FEMap with confidence intervals and the CLD rank, the second contains the raw comparison data of the metric including p-values.

Example stats dataframe

    Model       MUE  MUE_CI_Lower  ...  KTAU_CI_Lower  KTAU_CI_Upper  CLD
0  FE Map 1  9.326389      9.055143  ...       0.407202       0.724801    a
1  FE Map 2  9.326389      9.062688  ...       0.377350       0.713957    a
2  FE Map 3  9.326389      8.987990  ...       0.044224       0.456656    b

Example comparsion

    Model 1   Model 2  Diff in rho  ...  p-value  significant  p-value corrected
0  FE Map 1  FE Map 2     0.002788  ...    0.666        False              0.666
1  FE Map 1  FE Map 3     0.321357  ...    0.002         True              0.006
2  FE Map 2  FE Map 3     0.318568  ...    0.002         True              0.006

Todos

Notable points that this PR has either accomplished or will accomplish.

add more tests
add a centralise/ shift method to the nodewise comparisons?

Questions

Question1

Checklist

Added a news entry for new features, bug fixes, or other user facing changes.

Status

Ready to go

Tips

Comment "pre-commit.ci autofix" to have pre-commit.ci atomically format your PR.
Since this will create a commit, it is best to make this comment when you are finished with your work.

codecov · 2026-01-16T10:53:12Z

Codecov Report

❌ Patch coverage is 88.29787% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.19%. Comparing base (26440da) to head (117aa88).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
cinnabar/compare.py	83.70%	22 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #174      +/-   ##
==========================================
+ Coverage   69.39%   72.19%   +2.80%     
==========================================
  Files          19       21       +2     
  Lines        1186     1356     +170     
==========================================
+ Hits          823      979     +156     
- Misses        363      377      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jthorton · 2026-01-16T11:11:47Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

# Conflicts: # cinnabar/tests/test_compare.py

hannahbaumann · 2026-01-28T13:01:22Z

cinnabar/compare.py

+
+
+def compare_and_rank_femaps(
+    femaps: list[FEMap],


I wonder if it would be easier to have a single FEMap object that contains calc data from multiple experiments, using the "source" argument to distinguish between?

Yes that would be easier as the current method requires you to add the same experimental info to all of the maps you are comparing, but we would need to rework the MLE calculation on the legacy graph to group by source first and then split into different edges so kind of doing this under the hood, for now it might be easier to ask users to do this? We could add something which copies the experimental values to all graphs if that would be easier?

I believe that was the intent of the FEMap API - i.e. to have multiple sources so it's easier to do one big comparison rather than multiple objects.

add compare and rank method for FEMaps

639fbbe

jthorton and others added 3 commits January 16, 2026 11:13

fix pairwise comparisons, remove prints

1ab4380

[pre-commit.ci] auto fixes from pre-commit.com hooks

b0dbba7

for more information, see https://pre-commit.ci

Merge remote-tracking branch 'origin/compair' into compair

117aa88

# Conflicts: # cinnabar/tests/test_compare.py

jameseastwood assigned hannahbaumann and IAlibay Jan 26, 2026

hannahbaumann reviewed Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compare and ranking function for FEMaps#174

Add compare and ranking function for FEMaps#174
jthorton wants to merge 4 commits intomainfrom
compair

jthorton commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

jthorton commented Jan 16, 2026

Uh oh!

hannahbaumann Jan 28, 2026

Uh oh!

jthorton Jan 30, 2026

Uh oh!

IAlibay Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jthorton commented Jan 16, 2026

Description

Todos

Questions

Checklist

Status

Uh oh!

codecov bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jthorton commented Jan 16, 2026

Uh oh!

hannahbaumann Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

jthorton Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

IAlibay Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 16, 2026 •

edited

Loading