Skip to content

NGWPC PI-7 PR#1

Open
dylanlee wants to merge 279 commits intoNOAA-OWP:mainfrom
NGWPC:main
Open

NGWPC PI-7 PR#1
dylanlee wants to merge 279 commits intoNOAA-OWP:mainfrom
NGWPC:main

Conversation

@dylanlee
Copy link
Copy Markdown

This PR represents the initial delivery of the code associated with NGWPC's auto-eval-coordinator repository to OWP. The repository contains code for a data pipeline that works together with Hashicorp Nomad and another repo called auto-eval-jobs to perform FIM evaluations.

dylanlee and others added 30 commits May 3, 2025 14:37
merge main into data-handler branch
This updated pipeline is still only capable of using mock data but
brings the pipeline uptodate with the new innudate job
This seems to be working but should undergo a bit more testing
new pipeline.py working to the point where it can take in updated mock
data that points to the results of a stac query for the same huc that we
have mock HAND data for.
refactored pipeline.py that retains functionality of being able to work
with mock data
added the final job to pipeline and also updated the job definitions to
use the full suite of current config variables currently in the
autoeval-jobs .env file
job status now being updated and correct nomad job id being written
code in pipeline.py was overwriting job monitor status updates
Created a new branch for refactoring efforts. The commit in
feature/logging right before this commit was able to execute a
successful pipeline run with logging
Previously just the base class was in that file but decided it made more
sense to put the child classes there as well
Added garbage collection settings to the server block. These settings increase the frequency that old dispatched jobs and evaluations are cleaned up and should substitute for nomad_memory_monitor.sh for the PW deployment of autoeval.
Continued building out docs in preparation for pi7 repo delivery
Added job_sizing_guide.md

Finished a draft of interpreting-reports.md

Finished a draft of batch-run-guide-ParallelWorks.md
Merge the main-for-pr branch with PI-7 deliverables into OWP's version of this repos main branch with OWP specific files
@dylanlee dylanlee marked this pull request as draft September 30, 2025 15:13
@dylanlee dylanlee marked this pull request as ready for review September 30, 2025 19:17
@DJackson2313
Copy link
Copy Markdown

SWCM witness approval; release concurrence.

dylanlee and others added 17 commits October 15, 2025 12:15
Deleted the doc directory because docs already existed before merging
the repo with OWP.

Edited the local-nomad/REAMDE.md so that the load job command has a dash
in auto-eval-coordinator in the --network argument. This was necessary
because the OWP repo added a dash between autoeval and docker compose
appends the repo/directory name to the network that is created by the
repos docker-compose-local.yml
Delete "doc" directory and edit local-nomad/README.md
The argument name batch_root makes more sense as the argument for the
batch root directory for this script
Rename output_root argument in tools/make_master_metrics.py to batch_root
Get rid of conditionals using fsspec path normalization in two more
places
Simplify fsspec filehandling futher
Simplify local file / S3 referencing to use fsspec.core.url_to_fs.
* Add tests verifying metrics aggregation deduplication

Added unit tests and test data to verify that the current
MetricsAggregator class successfully deduplicates metrics for a unique
"collection_id", "stac_item_id", "scenario" combination only in the case
where an exact or near duplicate is present. In the case that a set of
index columns has different metrics an error is raised indicating a
violation of idempotency due to code or data changes.

Also added a test docker compose file for the unit tests and updated the
README with instructions on running the tests. Since the tests passed I
went ahead and deleted the clean_agg_metrics.py script from tools since
its primary job was getting rid of near duplicates.

* Update README.md
* Add aoi_stac_item_id and aoi_geom_path arguments

The goal of this commit is to refactor the pipeline,
submit_stac_batch.py, and the pipeline job definitions to not need an
AOI gpkg when --aoi_stac_item_id is provided along with a valid STAC
item ID string. Previously --aoi_is_item was a logical flag that
extracted the STAC item ID from the AOI gpkg name. When this flag was on
then the STAC item was queried by item ID instead of geometry.
--aoi_is_item was changed to --aoi_stac_item_id and was modified to
accept a string and that string will be used to query STAC items by item
ID and then extract an AOI within the pipeline itself from the benchmark
STAC when aoi_stac_item_id is being used.

Pipeline code was changed to extract the geometry from the STAC item
provided in the --aoi_stac_item_id argument. The geometry will be held
in GeoDataFrame and won't be persisted to disk. The --aoi argument will
become optional and was changed to --aoi_geom_path. So instead of
requiring --aoi we now require one of --aoi_geom_path or
--aoi_stac_item_id. When a --aoi_stac_item_id string is provided then
that string will be used when writing the pipeline outputs instead of
pulling the aoi name from the 'aoi_name' tag provided by the user.

submit_stac_batch.py was also changed. The big change to this script was
that we won't be extracting geometries from the benchmark STAC in this
script.

Nomad job definitions were changed to remove the aoi meta and add
optional --aoi_stac_item_id and --aoi_geom_path as optional meta
parameters. Conditional logic was added using a templated wrapper script
defined in the job definition around the command that invokes the
pipeline in the pipeline Nomad job depending on which meta parameter the
user provides.

* Fix conditional aoi argument passing in Nomad job definitions

The previous commit's changes to the nomad job definitions broke the
pipeline job. This commit's changes to the nomad job definitions
successfully allow for conditional dispatch of the coordinator task
depending on which aoi argument has been submitted to the parameterized
job.

I also fixed a small indentation bug in data_service.py and reformatted
according to the repos agreed upon line length conventions

* Update README

Updated README.md in repo root to reflect the fact that for the test
pipeline to fully work it still currently needs access to the fimc-data
bucket so that the agreement job can access masks.

* Remove initiating pipeline args into environment

Removed the NOMAD_META variables related to the pipeline job from the
env stanza. They aren't necessary for the task to call the pipeline's
main.py

* Remove conditional argument handling from pipeline job def

Removed the complexity of the inline bash script from pipeline job
definition. Now all optional meta parameters that are related to the
call to main.py are fed into the coordinator task. When a parameter
isn't provided main.py is fed an empty string for that argument and
resolves the argument to none during pipeline initializiation

* Add aoi_name tag derivation from aoi arguments

Modified the way the aoi_name tag is handled. The pipeline first looks
to see if it has been provided by a user and if it has that aoi_name tag
gets precedence. If it hasn't been provided then the aoi_name tag is
derived from aoi_stac_item_id or aoi_geom_path (whichever was provided)

* Update comment for tags meta parameter
…ents

drawio diagram now has a tab with a version of the diagram dated Jan
1st, 2026 that shows a schematic for how the existing pipeline will be
updated to be able to perform depth evaluations
Re-introduced edit showing that user can submit either an aoi or a
stac-item-id into the diagram
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants