Skip to content

Conversation

@hdpriest-ui
Copy link
Contributor

@hdpriest-ui hdpriest-ui commented Sep 29, 2025

Updated description of PR contents:

Now contains fully-functional workflow for workflow 2a, as well as example workflows for downloading data, referencing previous workflow runs, copying data in from prior workflow runs, and executing PEcAn function calls in a distributed fashion via slurm and apptainer.

Also supports local execution in a containerized environment.

Have created 'adapter' R files which leverage @infotroph 's command line argument structure and pass it via a XML structure into the workflow_functions.R centralized versions of code for stand-alone execution.

workflow_functions.R versions of workflow steps are - for the most part - copy-pasted from @infotroph's implementations.

workflow 2a can be realized via (from repo root, on ccmmf test cluster)

conda activate /home/hdpriest/miniconda3/envs/pecan-all-102425
cd ./orchestration/

Note: you will have to edit the 'workflow.base.run.directory' XML parameter in the orchestration XML below to your preferred location

Rscript 01_get_base_data.R -s workflow_orchestration.xml
Rscript 02_create_clim_files_dist.R  -s workflow_orchestration.xml
Rscript 03_build_xml_and_run.R  -s workflow_orchestration.xml

…preparation and analysis. Update .gitignore to exclude workflow run directories. Enhance run_pipeline scripts for better directory management and parameterization. Introduce new utility functions for data handling and workflow execution.

slurm workflow not yet functional.
added simple roxygen docs
updated pecan settings qstat to work with zero-length strings
added first draft setup shell script for one-button install
added workflow functions necessary for 1a
Added apptainer build image parent workflow
added apptainer sipnet-carb build workflow
added dockerfile to tools/ subdirectory

unlikely first attempt will build.
added line on obtaining current temp container
NOTE THE BUG: apptainer must be updated both in runscript as well as in the XML.
@hdpriest-ui
Copy link
Contributor Author

GHA based workflow successfully builds sipnet-carb docker in the source repo:
https://github.com/hdpriest-ui/workflows/actions/runs/18386239890

Copy link
Contributor

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments:

  • I think that integrating targets into the workflow is a good idea. It will be worth reviewing together after Chris has also had a chance to review it.
  • Documentation (e.g. a README) would be helpful. Contents could include:
    • Overview and useage of targets for workflows - general approach that can be used across the project
    • Rationale behind divergence from more standard targets workflows, choice not to use crew or _targets.R file; use of environment vars + gsub; storing functions and args. I know you've explained in meetings, but these will be helpful to document.
    • For this specific example implementing the ensemble workflow, it would be useful to document the workflow components, including a tar_manifest and diagram of the DAG (output of tar_network() or tar_mermaid)?

Comment on lines +5 to +6
# function authors are encouraged to think carefully about the dependencies of their functions.
# if dependencies are not present, it would be ideal for functions to error informatively rather than fail on imports.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are dependencies used? What should the authors consider?

return(file.path(local_path, prefix_filename))
}

#' Prepare PEcAn Run Directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar (create directory if it doesn't exist) is already done by prepare.settings() when it calls check.settings(). prepare.settings does a lot of other things but

https://github.com/PecanProject/pecan/blob/6d7a913dd9c5f6f3f992cbe3f9e3f263cd56bb6f/base/settings/R/check.all.settings.R#L479-L485

A standard pattern is:

settings <- PEcAn.settings::read.settings("pecan.xml")
settings <- PEcAn.settings::prepare.settings(settings)

But I don't see it used here in the workflows, so I'll defer to @infotroph to comment on whether not calling prepare.settings was a deliberate choice and whether it would be appropriate to use here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was deliberate: Specifically because prepare.settings wants a live DB and queries it too many times to change that readily, and more generally because this workflow puts responsibility for constructing and verifying the settings into the xml_build stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit to not fully understanding the db dependency, but prepare.settings is used for this purpose in the db-independent Demo 1: https://github.com/PecanProject/pecan/blob/bff6203e17cf4ff7f6c8e553f0ea16170051018b/documentation/tutorials/Demo_1_Basic_Run/run_pecan.qmd#L137

jobids[task_id] <- PEcAn.remote::qsub_get_jobid(
out = out[length(out)],
qsub.jobid = pecan_settings$host$qsub.jobid,
stop.on.error = stop.on.error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is stop.on.error defined (identified by lintr as 'no visible binding for global variable')

@@ -0,0 +1,159 @@
name: build-image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is a lightly modified version of the docker-build-image.yml in PecanProject/pecan and I have a vague memory of seeing instructions for using workflow files from other repositories. Would it be worth investigating if we can call this from the PEcAn repo rather than maintain duplicate versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its definitely worth investigating. In moving it into this repository, I hesitated to create a new maintenance point for the same method. Investigation is the right word - i have questions, such as: which ghcr/docker repo will the created sipnet-carb image end up in? (which do we want it to end up in?) Can we invoke the method in the pecan repo using secrets in the ccmmf repo?

I'll look into it.

hdpriest-ui and others added 4 commits October 20, 2025 09:59
Co-authored-by: David LeBauer <dlebauer@gmail.com>
… a distributed workflow

documents in hopefully useful state.
@dlebauer dlebauer requested a review from mdietze November 10, 2025 21:21
updated apptainer build to support develop
…can's version of this yaml.

added image-version input parameter at base apptainer sipnet-carb builder
-refactored configs into latest and devel for ease of stack testing
-refactored parameter passing: majority of workflow parameters are passed via orchestration XML
-minimized gsub replacements for clarity
- added script for XML build step
- added single function for XML build step
- leveraged targets "target_raw" methodology to enable function-call like invokations of multiple targets in re-usable blocks
- enabled parameter passing and parsing for function-like behavior of target blocks
- combined 03 and 04 steps from workflow 2a
- workflow 2a function execution working, data routing incomplete
…g parsing with centralized functions

- added smart functional resolution for either referencing external data, or copying external data into a run.
- added argument parsing through as.numeric() to correctly parameterize centralized workflow functions
- obtained successful 2a workflow replication via targets, apptainer and slurm
- updated example workflows for new data referencing
- removed obsolete example 3 variant
- removed some obsolete functions within workflow_functions.R
- added a gha for CI of workflows
- added self hosted runner info to github action
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants