-
Notifications
You must be signed in to change notification settings - Fork 10
Workflows Base Module #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
79 commits
Select commit
Hold shift + click to select a range
ad59354
initialize workflows base PR
cadeduckworth 792f6f1
initial transfer of workflows base module files and testing data from…
cadeduckworth 093f98a
initialize workflows registry module
cadeduckworth 2416a44
minor updates and reminders for when PR217 is merged
cadeduckworth 7f5075c
add base functionality with workflows registry module, core functions…
cadeduckworth bd5607c
remove old testing data
cadeduckworth f9885f3
pre-merge prep review and changes
cadeduckworth 9ba2a5d
Merge branch 'develop' into workflows-base
cadeduckworth 30ba2c0
docs
cadeduckworth 9a58cef
add option to provide a directory for csv file to be saved from outpu…
cadeduckworth 7e0765d
docs
cadeduckworth 28c72ae
fix import names
cadeduckworth e944833
add new testing data, .csv for workflows base module
cadeduckworth f6dbf55
initialize workflows tests
cadeduckworth c8c3a5b
update existing tests, issue with temp dirs and files still exists
cadeduckworth d2100f6
change file paths and naming conventions
cadeduckworth 98e99a7
Update test_workflows_base.py
cadeduckworth 3baa028
change base test assert value
cadeduckworth f5a6be7
assert df value
cadeduckworth 698df96
tests, directory path dataframes
cadeduckworth e7c0b28
directory_paths csv input test, unsure if additional assertion needed
cadeduckworth bac0066
test errors, exceptions, logging, workflows base module
cadeduckworth 554f31d
add workflows to STATES dictioonary
cadeduckworth 41ccfb6
change and update testing resource paths and add fixtures for tests
cadeduckworth bbc3477
fix double and single quotes and string formatting in workflows base …
cadeduckworth 68ebd60
edit docs
cadeduckworth 489f397
cleanup tests
cadeduckworth 03c64ba
Merge branch 'develop' into workflows-base
orbeckst aff964c
add documentation for workflows registry
cadeduckworth 650d934
Merge branch 'workflows-base' of github.com:Becksteinlab/MDPOW into w…
cadeduckworth f6999e4
registry docs
cadeduckworth ac03192
docs, and new entry in CHANGES
cadeduckworth 9c8334b
doc changes for registry
cadeduckworth eb75456
registry docs
cadeduckworth 2fc3241
registry docs
cadeduckworth 32316c3
registry docs
cadeduckworth cba5513
registry docs
cadeduckworth 0e91d02
docs
cadeduckworth acdedc4
docs
cadeduckworth 56f9fd0
docs
cadeduckworth 7712e2c
docs
cadeduckworth f2a73c6
docs
cadeduckworth fa6bf25
docs
cadeduckworth e173888
docs
cadeduckworth d1746fd
docs
cadeduckworth 0f2c50c
docs
cadeduckworth 827083d
docs
cadeduckworth 1317446
docs
cadeduckworth f26dfb4
docs
cadeduckworth 19d4f97
docs
cadeduckworth e5af917
docs
cadeduckworth 9f65ba1
docs and naming conventions
cadeduckworth 63532d7
tests and docs, naming conventions
cadeduckworth 936b2de
remove deprecated test
cadeduckworth 94c0320
Merge branch 'develop' into workflows-base
orbeckst e1a2ea8
docs and formatting
cadeduckworth e232302
reduce and reorganize try/except method to remove ambiguity and incre…
cadeduckworth 63e084e
docs
cadeduckworth 98d3770
Merge branch 'workflows-base' of github.com:Becksteinlab/MDPOW into w…
cadeduckworth fc0e580
docs
cadeduckworth c3f3ab6
reST workflows registry table for docs
cadeduckworth 26e02a0
registry doc table
cadeduckworth 41972e2
registry doc table
cadeduckworth f2d1bc4
registry doc table
cadeduckworth 3f085d1
registry docs
cadeduckworth 7b49ba8
registry docs
cadeduckworth 3e68f44
registry docs
cadeduckworth f978ba9
registry docs
cadeduckworth 78223c5
registry docs
cadeduckworth 4434454
registry docs
cadeduckworth d831df4
registry docs
cadeduckworth a231376
registry docs
cadeduckworth 5ddff82
registry docs
cadeduckworth bad6f85
registry docs
cadeduckworth e1b06aa
registry docs
cadeduckworth 679476b
registry docs
cadeduckworth 5625720
registry docs
cadeduckworth e337d29
registry docs
cadeduckworth 099d6b5
Apply suggestions from code review
orbeckst File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| ============== | ||
| Workflows Base | ||
| ============== | ||
|
|
||
| .. versionadded:: 0.9.0 | ||
|
|
||
| .. automodule:: mdpow.workflows.base |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| ================== | ||
| Workflows Registry | ||
| ================== | ||
|
|
||
| .. versionadded:: 0.9.0 | ||
|
|
||
| .. automodule:: mdpow.workflows.registry |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,101 @@ | ||
| import re | ||
| import os | ||
| import sys | ||
| import yaml | ||
| import pybol | ||
| import pytest | ||
| import pathlib | ||
| import logging | ||
|
|
||
| import pandas as pd | ||
|
|
||
| from . import RESOURCES | ||
| from . import STATES | ||
|
|
||
| import py.path | ||
|
|
||
| from ..workflows import base | ||
|
|
||
| from pkg_resources import resource_filename | ||
|
|
||
| RESOURCES = pathlib.PurePath(resource_filename(__name__, 'testing_resources')) | ||
| MANIFEST = RESOURCES / 'manifest.yml' | ||
|
|
||
| @pytest.fixture(scope='function') | ||
| def molname_workflows_directory(tmp_path): | ||
| m = pybol.Manifest(str(MANIFEST)) | ||
| m.assemble('workflows', tmp_path) | ||
| return tmp_path | ||
|
|
||
| class TestWorkflowsBase(object): | ||
|
|
||
| @pytest.fixture(scope='function') | ||
| def SM_tmp_dir(self, molname_workflows_directory): | ||
| dirname = molname_workflows_directory | ||
| return dirname | ||
|
|
||
| @pytest.fixture(scope='function') | ||
| def csv_input_data(self): | ||
| csv_path = STATES['workflows'] / 'project_paths.csv' | ||
| csv_df = pd.read_csv(csv_path).reset_index(drop=True) | ||
| return csv_path, csv_df | ||
|
|
||
| @pytest.fixture(scope='function') | ||
| def test_df_data(self): | ||
| test_dict = {'molecule' : ['SM25', 'SM26'], | ||
| 'resname' : ['SM25', 'SM26']} | ||
| test_df = pd.DataFrame(test_dict).reset_index(drop=True) | ||
| return test_df | ||
|
|
||
| @pytest.fixture(scope='function') | ||
| def project_paths_data(self, SM_tmp_dir): | ||
| project_paths = base.project_paths(parent_directory=SM_tmp_dir) | ||
| return project_paths | ||
|
|
||
| def test_project_paths(self, test_df_data, project_paths_data): | ||
| test_df = test_df_data | ||
| project_paths = project_paths_data | ||
|
|
||
| assert project_paths['molecule'][0] == test_df['molecule'][0] | ||
| assert project_paths['molecule'][1] == test_df['molecule'][1] | ||
| assert project_paths['resname'][0] == test_df['resname'][0] | ||
| assert project_paths['resname'][1] == test_df['resname'][1] | ||
|
|
||
| def test_project_paths_csv_input(self, csv_input_data): | ||
| csv_path, csv_df = csv_input_data | ||
| project_paths = base.project_paths(csv=csv_path) | ||
|
|
||
| pd.testing.assert_frame_equal(project_paths, csv_df) | ||
|
|
||
| def test_automated_project_analysis(self, project_paths_data, caplog): | ||
| project_paths = project_paths_data | ||
| # change resname to match topology (every SAMPL7 resname is 'UNK') | ||
| # only necessary for this dataset, not necessary for normal use | ||
| project_paths['resname'] = 'UNK' | ||
|
|
||
| base.automated_project_analysis(project_paths, solvents=('water',), | ||
| ensemble_analysis='DihedralAnalysis') | ||
|
|
||
| assert 'all analyses completed' in caplog.text, ('automated_dihedral_analysis ' | ||
| 'did not iteratively run to completion for the provided project') | ||
|
|
||
| def test_automated_project_analysis_KeyError(self, project_paths_data, caplog): | ||
| caplog.clear() | ||
| caplog.set_level(logging.ERROR, logger='mdpow.workflows.base') | ||
|
|
||
| project_paths = project_paths_data | ||
| # change resname to match topology (every SAMPL7 resname is 'UNK') | ||
| # only necessary for this dataset, not necessary for normal use | ||
| project_paths['resname'] = 'UNK' | ||
|
|
||
| # test error output when raised | ||
| with pytest.raises(KeyError, | ||
| match="Invalid ensemble_analysis 'DarthVaderAnalysis'. " | ||
| "An EnsembleAnalysis type that corresponds to an existing " | ||
| "automated workflow module must be input as a kwarg. ex: " | ||
| "ensemble_analysis='DihedralAnalysis'"): | ||
| base.automated_project_analysis(project_paths, ensemble_analysis='DarthVaderAnalysis', solvents=('water',)) | ||
|
|
||
| # test logger error recording | ||
| assert "'DarthVaderAnalysis' is an invalid selection" in caplog.text, ('did not catch incorrect ' | ||
| 'key specification for workflows.registry that results in KeyError') |
3 changes: 3 additions & 0 deletions
3
mdpow/tests/testing_resources/states/workflows/project_paths.csv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| molecule,resname,path | ||
| SM25,SM25,mdpow/tests/testing_resources/states/workflows/SM25 | ||
| SM26,SM26,mdpow/tests/testing_resources/states/workflows/SM26 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,180 @@ | ||
| # MDPOW: base.py | ||
| # 2022 Cade Duckworth | ||
|
|
||
| """ | ||
| :mod:`mdpow.workflows.base` --- Automated workflow base functions | ||
| ================================================================= | ||
|
|
||
| To analyze multiple MDPOW projects, provide :func:`project_paths` | ||
| with the top-level directory containing all MDPOW projects' simulation data | ||
| to obtain a :class:`pandas.DataFrame` containing the project information | ||
| and paths. Then, :func:`automated_project_analysis` takes as input the | ||
| aforementioned :class:`pandas.DataFrame` and runs the specified | ||
| :class:`~mdpow.analysis.ensemble.EnsembleAnalysis` for all MDPOW projects | ||
| under the top-level directory provided to :func:`project_paths`. | ||
|
|
||
| .. seealso:: :mod:`~mdpow.workflows.registry` | ||
|
|
||
| .. autofunction:: project_paths | ||
| .. autofunction:: automated_project_analysis | ||
|
|
||
| """ | ||
|
|
||
| import os | ||
| import re | ||
| import pandas as pd | ||
|
|
||
| from mdpow.workflows import registry | ||
|
|
||
| import logging | ||
|
|
||
| logger = logging.getLogger('mdpow.workflows.base') | ||
|
|
||
| def project_paths(parent_directory=None, csv=None, csv_save_dir=None): | ||
| """Takes a top directory containing MDPOW projects and determines | ||
| the molname, resname, and path, of each MDPOW project within. | ||
|
|
||
| Optionally takes a .csv file containing `molname`, `resname`, and | ||
| `paths`, in that order. | ||
|
|
||
| :keywords: | ||
|
|
||
| *parent_directory* | ||
| the path for the location of the top directory | ||
| under which the subdirectories of MDPOW simulation | ||
| data exist, additionally creates a 'project_paths.csv' file | ||
| for user manipulation of metadata and for future reference | ||
|
|
||
| *csv* | ||
| .csv file containing the molecule names, resnames, | ||
| and paths, in that order, for the MDPOW simulation | ||
| data to be iterated over must contain header of the | ||
| form: `molecule,resname,path` | ||
|
|
||
| *csv_save_dir* | ||
| optionally provided directory to save .csv file, otherwise, | ||
| data will be saved in current working directory | ||
|
|
||
| :returns: | ||
|
|
||
| *project_paths* | ||
| :class:`pandas.DataFrame` containing MDPOW project metadata | ||
|
|
||
| .. rubric:: Example | ||
|
|
||
| Typical Workflow:: | ||
|
|
||
| project_paths = project_paths(parent_directory='/foo/bar/MDPOW_projects') | ||
| automated_project_analysis(project_paths) | ||
|
|
||
| or:: | ||
|
|
||
| project_paths = project_paths(csv='/foo/bar/MDPOW.csv') | ||
| automated_project_analysis(project_paths) | ||
|
|
||
| """ | ||
|
|
||
| if parent_directory is not None: | ||
|
|
||
| locations = [] | ||
|
|
||
| reg_compile = re.compile('FEP') | ||
| for dirpath, dirnames, filenames in os.walk(parent_directory): | ||
| result = [dirpath.strip() for dirname in dirnames if reg_compile.match(dirname)] | ||
| if result: | ||
| locations.append(result[0]) | ||
|
|
||
| resnames = [] | ||
|
|
||
| for loc in locations: | ||
| res_temp = loc.strip().split('/') | ||
| resnames.append(res_temp[-1]) | ||
|
|
||
| project_paths = pd.DataFrame( | ||
| { | ||
| 'molecule': resnames, | ||
| 'resname': resnames, | ||
| 'path': locations | ||
| } | ||
| ) | ||
| if csv_save_dir is not None: | ||
| project_paths.to_csv(f'{csv_save_dir}/project_paths.csv', index=False) | ||
| logger.info(f'project_paths saved under {csv_save_dir}') | ||
| else: | ||
| current_directory = os.getcwd() | ||
| project_paths.to_csv('project_paths.csv', index=False) | ||
| logger.info(f'project_paths saved under {current_directory}') | ||
|
|
||
| elif csv is not None: | ||
| locations = pd.read_csv(csv) | ||
| project_paths = locations.sort_values(by=['molecule', 'resname', 'path']).reset_index(drop=True) | ||
|
|
||
| return project_paths | ||
|
|
||
| def automated_project_analysis(project_paths, ensemble_analysis, **kwargs): | ||
| """Takes a :class:`pandas.DataFrame` created by :func:`~mdpow.workflows.base.project_paths` | ||
| and iteratively runs the specified :class:`~mdpow.analysis.ensemble.EnsembleAnalysis` | ||
| for each of the projects by running the associated automated workflow | ||
| in each project directory returned by :func:`~mdpow.workflows.base.project_paths`. | ||
|
|
||
| Compatibility with more automated analyses in development. | ||
|
|
||
| :keywords: | ||
|
|
||
| *project_paths* | ||
| :class:`pandas.DataFrame` that provides paths to MDPOW projects | ||
|
|
||
| *ensemble_analysis* | ||
| name of the :class:`~mdpow.analysis.ensemble.EnsembleAnalysis` | ||
| that corresponds to the desired automated workflow module | ||
|
|
||
| *kwargs* | ||
| keyword arguments for the supported automated workflows, | ||
| see the :mod:`~mdpow.workflows.registry` for all available | ||
| workflows and their call signatures | ||
|
|
||
| .. rubric:: Example | ||
|
|
||
| A typical workflow is the automated dihedral analysis from | ||
| :mod:`mdpow.workflows.dihedrals`, which applies the *ensemble analysis* | ||
| :class:`~mdpow.analysis.dihedral.DihedralAnalysis` to each project. | ||
| The :data:`~mdpow.workflows.registry.registry` contains this automated | ||
| workflow under the key *"DihedralAnalysis"* and so the automated execution | ||
| for all `project_paths` (obtained via :func:`project_paths`) is performed by | ||
| passing the specific key to :func:`automated_project_analysis`:: | ||
|
|
||
| project_paths = project_paths(parent_directory='/foo/bar/MDPOW_projects') | ||
| automated_project_analysis(project_paths, ensemble_analysis='DihedralAnalysis', **kwargs) | ||
|
|
||
| """ | ||
|
|
||
| for row in project_paths.itertuples(): | ||
| molname = row.molecule | ||
| resname = row.resname | ||
| dirname = row.path | ||
|
|
||
| logger.info(f'starting {molname}') | ||
|
|
||
| try: | ||
| registry.registry[ensemble_analysis](dirname=dirname, resname=resname, molname=molname, **kwargs) | ||
cadeduckworth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| logger.info(f'{molname} completed') | ||
|
|
||
| except KeyError as err: | ||
| msg = (f"Invalid ensemble_analysis {err}. An EnsembleAnalysis type that corresponds " | ||
| "to an existing automated workflow module must be input as a kwarg. " | ||
| "ex: ensemble_analysis='DihedralAnalysis'") | ||
| logger.error(f'{err} is an invalid selection') | ||
|
|
||
| raise KeyError(msg) | ||
|
|
||
| except TypeError as err: | ||
| msg = (f"Invalid ensemble_analysis {ensemble_analysis}. An EnsembleAnalysis type that " | ||
| "corresponds to an existing automated workflow module must be input as a kwarg. " | ||
| "ex: ensemble_analysis='DihedralAnalysis'") | ||
| logger.error(f'workflow module for {ensemble_analysis} does not exist yet') | ||
|
|
||
| raise TypeError(msg) | ||
|
|
||
| logger.info('all analyses completed') | ||
| return | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # MDPOW: registry.py | ||
| # 2023 Cade Duckworth | ||
|
|
||
| """ | ||
| :mod:`mdpow.workflows.registry` --- Registry of currently supported automated workflows | ||
| ======================================================================================= | ||
|
|
||
| The :mod:`mdpow.workflows.registry` module hosts a dictionary with keys that correspond to an | ||
| :class:`~mdpow.analysis.ensemble.EnsembleAnalysis` for which exists a corresponding automated workflow. | ||
|
|
||
| .. table:: Currently supported automated workflows. | ||
| :widths: auto | ||
| :name: workflows_registry | ||
|
|
||
| +-------------------------------+------------------------------------------------------------------------------------------------------+ | ||
| | key/keyword: EnsembleAnalysis | value: <workflow module>.<top-level automated analysis function> | | ||
| +===============================+======================================================================================================+ | ||
| | DihedralAnalysis | :any:`dihedrals.automated_dihedral_analysis <mdpow.workflows.dihedrals.automated_dihedral_analysis>` | | ||
| +-------------------------------+------------------------------------------------------------------------------------------------------+ | ||
|
|
||
| .. autodata:: registry | ||
cadeduckworth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| .. seealso:: :mod:`~mdpow.workflows.base` | ||
cadeduckworth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| """ | ||
|
|
||
| # import analysis | ||
| from mdpow.workflows import dihedrals | ||
|
|
||
| registry = { | ||
cadeduckworth marked this conversation as resolved.
Show resolved
Hide resolved
cadeduckworth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| 'DihedralAnalysis' : dihedrals.automated_dihedral_analysis | ||
|
|
||
| } | ||
|
|
||
| """ | ||
| In the `registry`, each entry corresponds to an | ||
| :class:`~mdpow.analysis.ensemble.EnsembleAnalysis` | ||
| for which exists a corresponding automated workflow. | ||
|
|
||
| Intended for use with :mod:`mdpow.workflows.base` to specify which | ||
| :class:`~mdpow.analysis.ensemble.EnsembleAnalysis` should run iteratively over | ||
| the provided project data directory. | ||
|
|
||
| To include a new automated workflow for use with :mod:`mdpow.workflows.base`, | ||
| create a key that is the name of the corresponding | ||
| :class:`~mdpow.analysis.ensemble.EnsembleAnalysis`, with the value defined as | ||
| `<workflow module>.<top-level automated analysis function>`. | ||
|
|
||
| The available automated workflows (key-value pairs) are listed in the | ||
| following table :any:`Currently supported automated workflows. <workflows_registry>` | ||
|
|
||
| """ | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.