Remove parent spec to enable code-first workflow execution by smcolby · Pull Request #493 · OpenADMET/openadmet-models

smcolby · 2026-02-27T03:11:42Z

Description

This PR decouples anvil runtime execution from YAML-backed spec state by removing parent_spec from workflow classes and passing only grouped domain kwargs (model_kwargs, ensemble_kwargs, feat_kwargs), so workflows can be initialized and run programmatically from Python without raw recipe coupling.

In addition, this pull request represents a comprehensive overhaul of the openadmet testing architecture. The primary goal was to transition from lengthy, high-dependency integration-style tests to true, isolated unit tests.

Key changes

1. Remove entangled `parent_spec` attribute

Replaced parent_spec attribute with class-external (i.e. fields that don't explicitly belong to a BaseSpec subclass, "external" to their instantiation) per-spec dictionaries, i.e. model_kwargs, ensemble_kwargs, feat_kwargs, etc. These enable access to a global workflow state for component "communication" without maintaining a YAML-dependent specification object.

2. Unit tests in `tests/unit/` must be unit tests

Replaced slow, end-to-end integration tests (hiding in unit tests folder) with lightweight unit tests across heavy modules (anvil, inference, and CLI).
Introduced synthetic data fixtures (e.g., small PyTorch tensors and diverse SMILES sets) to validate data plumbing without requiring heavy disk access or RDKit overhead.
Unit tests on local MacBook Pro used to run in 15 minutes, now in 55 seconds.

3. Standardized mocking and isolation

Eliminated "tautological mocks" (mocks that only test themselves) and custom dummy classes, introduced by initial agentic pass (bad AI).
Standardized on the pytest-mock library (mocker fixture) to properly isolate testing boundaries and assert internal state transitions.
Used "real" fixtures wherever possible.

4. Mathematical and data rigor

Data Leakage Prevention: Implemented strict disjoint set validation (via set intersection of indices) for all chemical splitters to mathematically guarantee no overlap between train, validation, and test sets. (@dwwest)
UQ Validation: Updated inference tests to explicitly calculate and verify uncertainty quantification math (e.g., UCB bounds) instead of just checking for column existence.
Floating-Point Stability: Replaced fragile strict equality (==) with pytest.approx and numpy.testing.assert_almost_equal to ensure cross-platform consistency.

5. Feature concatenation bug fix

Exposed and fixed a critical bug in FeatureConcatenator where feature arrays were not being filtered by the intersection of valid indices. This fix prevents shape mismatches and silent data misalignment when featurizers drop different molecules.

6. Meaningful artifact validation

Removed assert True statements in evaluation modules.
Added assertions to verify that plotting classes actually generate valid matplotlib and seaborn objects.

Status

Ready to go

Developers certificate of origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

…ensemble tests

Added an index filtering step to FeatureConcatenator. Previously, if different featurizers dropped different molecules, the raw arrays were still concatenated, resulting in shape mismatches or mismatched rows. The features are now strictly masked to the common indices prior to concatenation.

This overhaul replaces slow, high-dependency integration tests with true unit tests utilizing pytest-mock and synthetic data fixtures. Key changes include swapping tautological file-writing mocks for internal state assertions, enforcing strict disjoint set validation for chemical splitters, and implementing rigorous mathematical validation for uncertainty quantification and evaluation metrics. These updates significantly improve execution speed and cross-platform stability by replacing fragile floating-point equality with robust approximate comparisons and isolating testing boundaries for featurizers, inference orchestration, and CLI logic.

for more information, see https://pre-commit.ci

…un execution-only

…utput-dir semantics

…nd output-dir fallback

…type checks

for more information, see https://pre-commit.ci

codecov-commenter · 2026-02-27T03:21:13Z

Codecov Report

❌ Patch coverage is 91.48936% with 8 lines in your changes missing coverage. Please review.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

smcolby · 2026-02-27T16:55:58Z

I'm not super happy with the tests in openadmet.models.tests.anvil. We want these to be truly unit tests (low execution time), but also actually meaningful/useful.

- Strip out unmaintained concrete stub classes. - Use mocker.create_autospec(..., instance=True) to satisfy Pydantic validations. - Centralize mock state injection in the build_workflow helper.

Remove tautological mocking of the model, featurizer, metadata, and data spec in test_inference.py. Replace with real instantiated FingerprintFeaturizer, DummyRegressorModel, CommitteeRegressor, Metadata, and DataSpec objects so that SMILES physically flow through the featurization and prediction pipeline. Only the file I/O boundary (load_anvil_model_and_metadata) remains patched. Assertions now verify mathematically derived values: single model PRED=1.0 (training mean), ensemble PRED=2.0/STD=1.0, and UCB=4.0 (mean + beta*std = 2.0 + 2.0*1.0).

smcolby · 2026-02-28T18:17:06Z

+        feat_kwargs = {
+            "type": self.procedure.feat.type,
+            "params": self.procedure.feat.params,
+        }


@hmacdope This is just one idea of how to get around parent_spec for what I've been calling "class-external" fields, i.e. fields that aren't defined in the class returned by the spec section (but still required externally, as they modulate behavior of how the class gets used). It's actually my idea, not AI (though I did tell AI to implement this pattern...)

smcolby · 2026-02-28T18:20:22Z

+            data_yaml=recipe_components / "data.yaml",
+            report_yaml=recipe_components / "eval.yaml",
        )
+        return result


This pattern got added, I think because it allows YAMLs to be written here, instead of trying to write them in AnvilWorkflow.run. The latter requires complete passthrough of everything needed to write the YAML, i.e. parent_spec, so YAML writing needed to be removed from AnvilWorkflow.

Certainly can remove if we want to handle YAML output somewhere else (or just write the YAMLs in the to_workflow call, then still run the workflow from AnvilWorkflow.run).

this doesn't bother me, but if you asked me if this task should be a part of the global workflow, or a part of specification defining, i would say the global workflow.

eh actually, I am thinking about it more. I like it here.

Leaving comment open for @hmacdope to weigh in

smcolby · 2026-02-28T18:21:20Z

-            data_yaml=recipe_components / "data.yaml",
-            report_yaml=recipe_components / "eval.yaml",
-        )
-


@hmacdope Here's the YAML writing within workflow that got removed. Necessary to decouple parent_spec.

smcolby · 2026-02-28T18:23:28Z

+                    if not Path(s).exists():
+                        raise ValueError(f"serial_path '{s}' does not exist.")
+        return self
+


I added this to prevent upstream execution (loading data, featurization, etc.) before we hit the error that says "oops, your paths aren't specified correctly, or don't even exist"

smcolby · 2026-02-28T18:25:43Z

            raise ValueError(
-                f"The model has {self.model.n_tasks} tasks but the data specification has {len(self.data_spec.target_cols)} target columns."
+                f"The model has {self.model._n_tasks} tasks but the data specification has {len(self.data_spec.target_cols)} target columns."
            )


Only _n_tasks is guaranteed to be defined, n_tasks sometimes exists. See #498.

smcolby · 2026-02-28T18:26:07Z

+        if y_arr.ndim == 2 and y_arr.shape[1] == 1:
+            y_arr = y_arr.ravel()
+        self.estimator = self.estimator.fit(X, y_arr)



@dwwest Make sure this jives with our shape coercion elsewhere.

This is the only place we do this, if we're going to make this change, can we ask the agent to make the equivalent change in very architecture?

Give it a try!

I thought we do check shapes in other places though

smcolby · 2026-02-28T18:27:21Z

-    wf = spec.to_workflow()
    click.echo(f"Workflow initialized successfully with recipe: {recipe_path}")
-    wf.run(tag=tag, debug=debug, output_dir=output_dir)
+    spec.run(tag=tag, debug=debug, output_dir=output_dir)


@hmacdope Implements the run method off of Specification instead of Workflow so YAMLs get saved. See other comment for more details / options.

smcolby · 2026-02-28T18:28:08Z

+            # find where common_indices are in idx
+            mask = np.isin(idx, common_indices)
+            filtered_feats.append(feat[mask])
+


@dwwest Fixes a bug where feature index intersection wasn't actually be leveraged. Please double check!

I fully missed this, whoops, but lgtm

dwwest · 2026-03-04T19:29:55Z

    section_name: ClassVar[str] = "feat"
+    type: str | None = None
+    params: dict = Field(default_factory=dict)



I'm confused what we're talking about here, do you mean the logic that reads the driver type from the trainer? What does that have to do with featurization?

dwwest · 2026-03-04T20:19:56Z

+        if y_arr.ndim == 2 and y_arr.shape[1] == 1:
+            y_arr = y_arr.ravel()
+        self.estimator = self.estimator.fit(X, y_arr)



This is the only place we do this, if we're going to make this change, can we ask the agent to make the equivalent change in very architecture?

khuddzu · 2026-03-04T20:30:06Z

+            param_paths = self.ensemble_kwargs.get("param_paths")
+            serial_paths = self.ensemble_kwargs.get("serial_paths")
+            if (param_paths is None) != (serial_paths is None):
+                raise ValueError(
+                    "Both param_paths and serial_paths must be provided together for ensemble finetuning."
+                )


This section is the same, whether or not ensemble is true or false.

Not quite! param_path and serial_path come from model_kwargs if ensemble is False. If True, param_paths and serial_paths (note plural) come from ensemble_kwargs.

khuddzu · 2026-03-04T20:38:10Z

We should keep this for now, but in the future I would not hold importance to updating any of the mtenn code or tests. It might be something that is removed or archived.

Yeah... I at least wanted what's here to run quickly. But agreed this may end up phased out.

khuddzu

Looks good, I left a couple of comments, but nothing wild. I want to ensure that you worked out the issue with mock loading data. Did you resolve the issue of the tests being coded in order to pass, not actually testing the package? I didn't notice this, but could have missed something.

smcolby · 2026-03-04T23:31:24Z

Looks good, I left a couple of comments, but nothing wild. I want to ensure that you worked out the issue with mock loading data. Did you resolve the issue of the tests being coded in order to pass, not actually testing the package? I didn't notice this, but could have missed something.

Yep, as far as I was able to convince myself, the tests are not tautological. That was an artifact of an earlier pass.

…unction

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

for more information, see https://pre-commit.ci

smcolby and others added 19 commits February 25, 2026 16:19

Implement minor fixes to dummy regressor to enable reuse in comittee …

c7c4687

…ensemble tests

Refactor tests such that they are "unit" rather than "integration"

611b644

Add additional tests for active learning modules

d5a9e8f

Add pytest-mock dependency

28f083b

Add instructions to avoid common testing pitfalls

518aa1e

[pre-commit.ci] auto fixes from pre-commit.com hooks

1df313a

for more information, see https://pre-commit.ci

Add documentation to unit tests

45ef511

[pre-commit.ci] auto fixes from pre-commit.com hooks

d9732b0

for more information, see https://pre-commit.ci

Add testing for different splits and ensemble

f517107

Remove parent_spec from workflow base and add grouped runtime kwargs

b86f1fe

Replace parent_spec access with model/ensemble/feat kwargs and keep r…

deecfdb

…un execution-only

Move provenance YAML export to AnvilSpecification.run and align tag/o…

b0f8df2

…utput-dir semantics

Invoke specification.run for anvil orchestration

a3e1c06

Update workflow/spec tests for kwargs wiring, provenance ownership, a…

da12154

…nd output-dir fallback

Update anvil CLI assertion path to spec.run and normalize output_dir …

9cf406a

…type checks

Add code-first tests for workflows

7670ab9

[pre-commit.ci] auto fixes from pre-commit.com hooks

e832f07

for more information, see https://pre-commit.ci

smcolby added 9 commits February 27, 2026 12:23

Redo anvil unit tests

b49063c

Move from excessive dummy implmentations to surgical mocks

4b0b310

Refactor workflow base unit tests to use dynamic autospecs

dc1a0eb

- Strip out unmaintained concrete stub classes. - Use mocker.create_autospec(..., instance=True) to satisfy Pydantic validations. - Centralize mock state injection in the build_workflow helper.

Add validator for serial and param path(s)

5a3e471

Refactor unit tests

d4f4f06

Remove legacy tests

ef96db2

Fix formatting

10dbfdc

Update docstring

2138632

smcolby commented Feb 28, 2026

View reviewed changes

Comment thread openadmet/models/anvil/specification.py

smcolby commented Feb 28, 2026

View reviewed changes

Comment thread openadmet/models/anvil/workflow.py Outdated

smcolby commented Feb 28, 2026

View reviewed changes

Comment thread openadmet/models/anvil/workflow.py Outdated

smcolby commented Feb 28, 2026

View reviewed changes

Comment thread openadmet/models/tests/unit/split/test_splitters.py

smcolby added 4 commits February 28, 2026 10:28

Set default calibration method to None

878e887

Reduce ensemble members to 2 to speed up tests

9418a14

Remove kwargs get fallback for calibration method (handled in spec)

2ea103e

Guard against zero stdev with small epsilon value

755ddf2

smcolby mentioned this pull request Feb 28, 2026

[WIP] Remove parent spec and see what happens #478

Closed

1 task

smcolby self-assigned this Mar 1, 2026

smcolby requested review from khuddzu March 4, 2026 17:36

khuddzu reviewed Mar 4, 2026

View reviewed changes

Comment thread openadmet/models/anvil/workflow.py

dwwest requested changes Mar 4, 2026

View reviewed changes

khuddzu reviewed Mar 4, 2026

View reviewed changes

dwwest and others added 4 commits March 6, 2026 14:54

Remove feat_kwargs and replace access in workflow with isinstance() f…

417865c

…unction

Fix import and remove stale feat_kwargs test references

eaadf08

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove type and params from FeatureSpec

e88822b

[pre-commit.ci] auto fixes from pre-commit.com hooks

59fd922

for more information, see https://pre-commit.ci

Conversation

smcolby commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key changes

1. Remove entangled parent_spec attribute

2. Unit tests in tests/unit/ must be unit tests

3. Standardized mocking and isolation

4. Mathematical and data rigor

5. Feature concatenation bug fix

6. Meaningful artifact validation

Status

Developers certificate of origin

Uh oh!

codecov-commenter commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

smcolby commented Feb 27, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

khuddzu left a comment

Choose a reason for hiding this comment

Uh oh!

smcolby commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

smcolby commented Feb 27, 2026 •

edited

Loading

1. Remove entangled `parent_spec` attribute

2. Unit tests in `tests/unit/` must be unit tests

codecov-commenter commented Feb 27, 2026 •

edited

Loading