Remove parent spec to enable code-first workflow execution#493
Remove parent spec to enable code-first workflow execution#493
Conversation
Added an index filtering step to FeatureConcatenator. Previously, if different featurizers dropped different molecules, the raw arrays were still concatenated, resulting in shape mismatches or mismatched rows. The features are now strictly masked to the common indices prior to concatenation.
This overhaul replaces slow, high-dependency integration tests with true unit tests utilizing pytest-mock and synthetic data fixtures. Key changes include swapping tautological file-writing mocks for internal state assertions, enforcing strict disjoint set validation for chemical splitters, and implementing rigorous mathematical validation for uncertainty quantification and evaluation metrics. These updates significantly improve execution speed and cross-platform stability by replacing fragile floating-point equality with robust approximate comparisons and isolating testing boundaries for featurizers, inference orchestration, and CLI logic.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…un execution-only
…utput-dir semantics
…nd output-dir fallback
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is 🚀 New features to boost your workflow:
|
|
I'm not super happy with the tests in |
- Strip out unmaintained concrete stub classes. - Use mocker.create_autospec(..., instance=True) to satisfy Pydantic validations. - Centralize mock state injection in the build_workflow helper.
Remove tautological mocking of the model, featurizer, metadata, and data spec in test_inference.py. Replace with real instantiated FingerprintFeaturizer, DummyRegressorModel, CommitteeRegressor, Metadata, and DataSpec objects so that SMILES physically flow through the featurization and prediction pipeline. Only the file I/O boundary (load_anvil_model_and_metadata) remains patched. Assertions now verify mathematically derived values: single model PRED=1.0 (training mean), ensemble PRED=2.0/STD=1.0, and UCB=4.0 (mean + beta*std = 2.0 + 2.0*1.0).
| feat_kwargs = { | ||
| "type": self.procedure.feat.type, | ||
| "params": self.procedure.feat.params, | ||
| } |
There was a problem hiding this comment.
@hmacdope This is just one idea of how to get around parent_spec for what I've been calling "class-external" fields, i.e. fields that aren't defined in the class returned by the spec section (but still required externally, as they modulate behavior of how the class gets used). It's actually my idea, not AI (though I did tell AI to implement this pattern...)
| data_yaml=recipe_components / "data.yaml", | ||
| report_yaml=recipe_components / "eval.yaml", | ||
| ) | ||
| return result |
There was a problem hiding this comment.
This pattern got added, I think because it allows YAMLs to be written here, instead of trying to write them in AnvilWorkflow.run. The latter requires complete passthrough of everything needed to write the YAML, i.e. parent_spec, so YAML writing needed to be removed from AnvilWorkflow.
Certainly can remove if we want to handle YAML output somewhere else (or just write the YAMLs in the to_workflow call, then still run the workflow from AnvilWorkflow.run).
There was a problem hiding this comment.
this doesn't bother me, but if you asked me if this task should be a part of the global workflow, or a part of specification defining, i would say the global workflow.
There was a problem hiding this comment.
eh actually, I am thinking about it more. I like it here.
There was a problem hiding this comment.
Leaving comment open for @hmacdope to weigh in
| data_yaml=recipe_components / "data.yaml", | ||
| report_yaml=recipe_components / "eval.yaml", | ||
| ) | ||
|
|
There was a problem hiding this comment.
@hmacdope Here's the YAML writing within workflow that got removed. Necessary to decouple parent_spec.
| if not Path(s).exists(): | ||
| raise ValueError(f"serial_path '{s}' does not exist.") | ||
| return self | ||
|
|
There was a problem hiding this comment.
I added this to prevent upstream execution (loading data, featurization, etc.) before we hit the error that says "oops, your paths aren't specified correctly, or don't even exist"
| raise ValueError( | ||
| f"The model has {self.model.n_tasks} tasks but the data specification has {len(self.data_spec.target_cols)} target columns." | ||
| f"The model has {self.model._n_tasks} tasks but the data specification has {len(self.data_spec.target_cols)} target columns." | ||
| ) |
There was a problem hiding this comment.
Only _n_tasks is guaranteed to be defined, n_tasks sometimes exists. See #498.
| if y_arr.ndim == 2 and y_arr.shape[1] == 1: | ||
| y_arr = y_arr.ravel() | ||
| self.estimator = self.estimator.fit(X, y_arr) | ||
|
|
There was a problem hiding this comment.
@dwwest Make sure this jives with our shape coercion elsewhere.
There was a problem hiding this comment.
This is the only place we do this, if we're going to make this change, can we ask the agent to make the equivalent change in very architecture?
There was a problem hiding this comment.
I thought we do check shapes in other places though
| wf = spec.to_workflow() | ||
| click.echo(f"Workflow initialized successfully with recipe: {recipe_path}") | ||
| wf.run(tag=tag, debug=debug, output_dir=output_dir) | ||
| spec.run(tag=tag, debug=debug, output_dir=output_dir) |
There was a problem hiding this comment.
@hmacdope Implements the run method off of Specification instead of Workflow so YAMLs get saved. See other comment for more details / options.
| # find where common_indices are in idx | ||
| mask = np.isin(idx, common_indices) | ||
| filtered_feats.append(feat[mask]) | ||
|
|
There was a problem hiding this comment.
@dwwest Fixes a bug where feature index intersection wasn't actually be leveraged. Please double check!
There was a problem hiding this comment.
I fully missed this, whoops, but lgtm
| section_name: ClassVar[str] = "feat" | ||
| type: str | None = None | ||
| params: dict = Field(default_factory=dict) | ||
|
|
There was a problem hiding this comment.
I'm confused what we're talking about here, do you mean the logic that reads the driver type from the trainer? What does that have to do with featurization?
| if y_arr.ndim == 2 and y_arr.shape[1] == 1: | ||
| y_arr = y_arr.ravel() | ||
| self.estimator = self.estimator.fit(X, y_arr) | ||
|
|
There was a problem hiding this comment.
This is the only place we do this, if we're going to make this change, can we ask the agent to make the equivalent change in very architecture?
| param_paths = self.ensemble_kwargs.get("param_paths") | ||
| serial_paths = self.ensemble_kwargs.get("serial_paths") | ||
| if (param_paths is None) != (serial_paths is None): | ||
| raise ValueError( | ||
| "Both param_paths and serial_paths must be provided together for ensemble finetuning." | ||
| ) |
There was a problem hiding this comment.
This section is the same, whether or not ensemble is true or false.
There was a problem hiding this comment.
Not quite! param_path and serial_path come from model_kwargs if ensemble is False. If True, param_paths and serial_paths (note plural) come from ensemble_kwargs.
There was a problem hiding this comment.
We should keep this for now, but in the future I would not hold importance to updating any of the mtenn code or tests. It might be something that is removed or archived.
There was a problem hiding this comment.
Yeah... I at least wanted what's here to run quickly. But agreed this may end up phased out.
khuddzu
left a comment
There was a problem hiding this comment.
Looks good, I left a couple of comments, but nothing wild. I want to ensure that you worked out the issue with mock loading data. Did you resolve the issue of the tests being coded in order to pass, not actually testing the package? I didn't notice this, but could have missed something.
Yep, as far as I was able to convince myself, the tests are not tautological. That was an artifact of an earlier pass. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
Description
This PR decouples
anvilruntime execution from YAML-backed spec state by removing parent_spec from workflow classes and passing only grouped domain kwargs (model_kwargs,ensemble_kwargs,feat_kwargs), so workflows can be initialized and run programmatically from Python without raw recipe coupling.In addition, this pull request represents a comprehensive overhaul of the
openadmettesting architecture. The primary goal was to transition from lengthy, high-dependency integration-style tests to true, isolated unit tests.Key changes
1. Remove entangled
parent_specattributeparent_specattribute with class-external (i.e. fields that don't explicitly belong to a BaseSpec subclass, "external" to their instantiation) per-spec dictionaries, i.e.model_kwargs,ensemble_kwargs,feat_kwargs, etc. These enable access to a global workflow state for component "communication" without maintaining a YAML-dependent specification object.2. Unit tests in
tests/unit/must be unit testsanvil,inference, andCLI).3. Standardized mocking and isolation
pytest-mocklibrary (mockerfixture) to properly isolate testing boundaries and assert internal state transitions.4. Mathematical and data rigor
==) withpytest.approxandnumpy.testing.assert_almost_equalto ensure cross-platform consistency.5. Feature concatenation bug fix
FeatureConcatenatorwhere feature arrays were not being filtered by the intersection of valid indices. This fix prevents shape mismatches and silent data misalignment when featurizers drop different molecules.6. Meaningful artifact validation
assert Truestatements in evaluation modules.matplotlibandseabornobjects.Status
Developers certificate of origin