Open
Conversation
merge main into data-handler branch
…with a pipeline-dev service
This updated pipeline is still only capable of using mock data but brings the pipeline uptodate with the new innudate job
This seems to be working but should undergo a bit more testing
new pipeline.py working to the point where it can take in updated mock data that points to the results of a stac query for the same huc that we have mock HAND data for.
refactored pipeline.py that retains functionality of being able to work with mock data
added the final job to pipeline and also updated the job definitions to use the full suite of current config variables currently in the autoeval-jobs .env file
job status now being updated and correct nomad job id being written
code in pipeline.py was overwriting job monitor status updates
Created a new branch for refactoring efforts. The commit in feature/logging right before this commit was able to execute a successful pipeline run with logging
Previously just the base class was in that file but decided it made more sense to put the child classes there as well
Added garbage collection settings to the server block. These settings increase the frequency that old dispatched jobs and evaluations are cleaned up and should substitute for nomad_memory_monitor.sh for the PW deployment of autoeval.
Continued building out docs in preparation for pi7 repo delivery
Added job_sizing_guide.md Finished a draft of interpreting-reports.md Finished a draft of batch-run-guide-ParallelWorks.md
Merge the main-for-pr branch with PI-7 deliverables into OWP's version of this repos main branch with OWP specific files
|
SWCM witness approval; release concurrence. |
Deleted the doc directory because docs already existed before merging the repo with OWP. Edited the local-nomad/REAMDE.md so that the load job command has a dash in auto-eval-coordinator in the --network argument. This was necessary because the OWP repo added a dash between autoeval and docker compose appends the repo/directory name to the network that is created by the repos docker-compose-local.yml
Delete "doc" directory and edit local-nomad/README.md
The argument name batch_root makes more sense as the argument for the batch root directory for this script
Rename output_root argument in tools/make_master_metrics.py to batch_root
…stead of conditionals for each.
Get rid of conditionals using fsspec path normalization in two more places
Simplify fsspec filehandling futher
Simplify local file / S3 referencing to use fsspec.core.url_to_fs.
* Add tests verifying metrics aggregation deduplication Added unit tests and test data to verify that the current MetricsAggregator class successfully deduplicates metrics for a unique "collection_id", "stac_item_id", "scenario" combination only in the case where an exact or near duplicate is present. In the case that a set of index columns has different metrics an error is raised indicating a violation of idempotency due to code or data changes. Also added a test docker compose file for the unit tests and updated the README with instructions on running the tests. Since the tests passed I went ahead and deleted the clean_agg_metrics.py script from tools since its primary job was getting rid of near duplicates. * Update README.md
* Add aoi_stac_item_id and aoi_geom_path arguments The goal of this commit is to refactor the pipeline, submit_stac_batch.py, and the pipeline job definitions to not need an AOI gpkg when --aoi_stac_item_id is provided along with a valid STAC item ID string. Previously --aoi_is_item was a logical flag that extracted the STAC item ID from the AOI gpkg name. When this flag was on then the STAC item was queried by item ID instead of geometry. --aoi_is_item was changed to --aoi_stac_item_id and was modified to accept a string and that string will be used to query STAC items by item ID and then extract an AOI within the pipeline itself from the benchmark STAC when aoi_stac_item_id is being used. Pipeline code was changed to extract the geometry from the STAC item provided in the --aoi_stac_item_id argument. The geometry will be held in GeoDataFrame and won't be persisted to disk. The --aoi argument will become optional and was changed to --aoi_geom_path. So instead of requiring --aoi we now require one of --aoi_geom_path or --aoi_stac_item_id. When a --aoi_stac_item_id string is provided then that string will be used when writing the pipeline outputs instead of pulling the aoi name from the 'aoi_name' tag provided by the user. submit_stac_batch.py was also changed. The big change to this script was that we won't be extracting geometries from the benchmark STAC in this script. Nomad job definitions were changed to remove the aoi meta and add optional --aoi_stac_item_id and --aoi_geom_path as optional meta parameters. Conditional logic was added using a templated wrapper script defined in the job definition around the command that invokes the pipeline in the pipeline Nomad job depending on which meta parameter the user provides. * Fix conditional aoi argument passing in Nomad job definitions The previous commit's changes to the nomad job definitions broke the pipeline job. This commit's changes to the nomad job definitions successfully allow for conditional dispatch of the coordinator task depending on which aoi argument has been submitted to the parameterized job. I also fixed a small indentation bug in data_service.py and reformatted according to the repos agreed upon line length conventions * Update README Updated README.md in repo root to reflect the fact that for the test pipeline to fully work it still currently needs access to the fimc-data bucket so that the agreement job can access masks. * Remove initiating pipeline args into environment Removed the NOMAD_META variables related to the pipeline job from the env stanza. They aren't necessary for the task to call the pipeline's main.py * Remove conditional argument handling from pipeline job def Removed the complexity of the inline bash script from pipeline job definition. Now all optional meta parameters that are related to the call to main.py are fed into the coordinator task. When a parameter isn't provided main.py is fed an empty string for that argument and resolves the argument to none during pipeline initializiation * Add aoi_name tag derivation from aoi arguments Modified the way the aoi_name tag is handled. The pipeline first looks to see if it has been provided by a user and if it has that aoi_name tag gets precedence. If it hasn't been provided then the aoi_name tag is derived from aoi_stac_item_id or aoi_geom_path (whichever was provided) * Update comment for tags meta parameter
…ents drawio diagram now has a tab with a version of the diagram dated Jan 1st, 2026 that shows a schematic for how the existing pipeline will be updated to be able to perform depth evaluations
Drawio diagram
Re-introduced edit showing that user can submit either an aoi or a stac-item-id into the diagram
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR represents the initial delivery of the code associated with NGWPC's auto-eval-coordinator repository to OWP. The repository contains code for a data pipeline that works together with Hashicorp Nomad and another repo called auto-eval-jobs to perform FIM evaluations.