Skip to content

Hook post-hoc binary metrics and plots into standard evaluation framework#492

Open
smcolby wants to merge 12 commits intomainfrom
copilot-test
Open

Hook post-hoc binary metrics and plots into standard evaluation framework#492
smcolby wants to merge 12 commits intomainfrom
copilot-test

Conversation

@smcolby
Copy link
Copy Markdown
Contributor

@smcolby smcolby commented Feb 26, 2026

Description

This PR integrates post-hoc binary classification into the standard openadmet evaluation and orchestration framework. By mirroring the established regression pattern, PosthocBinaryMetrics and the newly created PosthocBinaryPlots can now be dynamically instantiated and called by the pipeline.

Key Changes

  • API Standardization: Updated PosthocBinaryMetrics.evaluate to accept y_true, y_pred, and cutoff. It now returns the standard nested dictionary format expected by the workflow ({"task_0": {"precision": {"value": ...}}}).
  • New Plotting Class: Implemented PosthocBinaryPlots to generate post-hoc classification scatter plots and confusion matrices. It returns a dictionary of matplotlib.figure.Figure objects.
  • Dynamic Registration: Registered both classes via the @evaluators.register decorator.
  • Rigorous Unit Tests: Added test_posthoc_binary_metrics_evaluate and test_posthoc_binary_plots_evaluate to test_eval.py. These tests strictly verify mathematical outputs and object instantiation, adhering to the project's rule against tautological or lazy (assert True) testing.

Status

  • Ready to go

Developers certificate of origin

smcolby and others added 11 commits February 25, 2026 16:19
Added an index filtering step to FeatureConcatenator. Previously, if different featurizers dropped different molecules, the raw arrays were still concatenated, resulting in shape mismatches or mismatched rows. The features are now strictly masked to the common indices prior to concatenation.
This overhaul replaces slow, high-dependency integration tests with true unit tests utilizing pytest-mock and synthetic data fixtures. Key changes include swapping tautological file-writing mocks for internal state assertions, enforcing strict disjoint set validation for chemical splitters, and implementing rigorous mathematical validation for uncertainty quantification and evaluation metrics. These updates significantly improve execution speed and cross-platform stability by replacing fragile floating-point equality with robust approximate comparisons and isolating testing boundaries for featurizers, inference orchestration, and CLI logic.
Updated PosthocBinaryMetrics and created PosthocBinaryPlots to conform to the standard evaluate API, returning nested metric dictionaries and matplotlib objects. Registered both classes and added strict unit tests to verify mathematical accuracy and figure generation.
@smcolby smcolby self-assigned this Feb 26, 2026
@smcolby
Copy link
Copy Markdown
Contributor Author

smcolby commented Feb 26, 2026

FYI we just did this to demo copilot functionality. Probably still worth merging for posterity, but generally we're less interested in binary classification workflows.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 77.27273% with 25 lines in your changes missing coverage. Please review.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@smcolby
Copy link
Copy Markdown
Contributor Author

smcolby commented Feb 28, 2026

Will resolve #143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants