Skip to content

Conversation

@EddieLF
Copy link
Contributor

@EddieLF EddieLF commented Sep 4, 2025

@EddieLF EddieLF requested a review from a team as a code owner September 4, 2025 00:41
@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2025

🐳 Docker Image Built

A new Docker image has been built for this PR:

Image: australia-southeast1-docker.pkg.dev/cpg-common/images-dev/cpg_flow:fb5b9c981a5530bd11d6af8b9da0dc5b1e2d6862

Pull command:

docker pull australia-southeast1-docker.pkg.dev/cpg-common/images-dev/cpg_flow:fb5b9c981a5530bd11d6af8b9da0dc5b1e2d6862

🔗 View in Google Cloud Console


This comment was automatically generated by the Docker workflow.

@MattWellie
Copy link
Contributor

MattWellie commented Sep 9, 2025

I haven't reviewed this in full, but I think this can be implemented with a backwards-compatible interface between stages and CPG-Flow. The currently supported values for analysis_keys is list[str | Path] | None Given a random example of a stage with three outputs; two I'd like to combine into a single analysis entry, and a third which could be separate, I pictured this as moving from:

@stage.stage(
    analysis_keys=['output_1', 'output_2', 'output_3'],
    required_stages=[AStage],
    analysis_type='type_1',
)

to

@stage.stage(
    analysis_keys=[('output_1', 'output_2'), 'output_3'],
    required_stages=[AStage],
    analysis_type='type_1',
)

where the expected analysis entry would be

1: 'primary' output output_1, secondary output output_2, analysis_type type_1
- this could be extended with as many secondary files as you like, because each analysis entry has one main output, and an associated number of secondaries
2: output_3 only, analysis_type type_1

We could also extend this to support multiple analysis types at the same time, where analysis_type could either be a string (current), i.e. all outputs are given the same type.

This could be moved to a str | iterable[str], e.g.

@stage.stage(
    analysis_keys=['output_1', 'output_2', 'output_3'],
    required_stages=[AStage],
    analysis_type='type_1',
)

to

@stage.stage(
    analysis_keys=[('output_1', 'output_2'), 'output_3'],
    required_stages=[AStage],
    analysis_type=['type_1', 'type_2'],
)

Which would be read as two analysis entries:

1: 'primary' output: output_1, secondary output: output_2, analysis_type type_1
2: output_3 only, analysis_type type_2

This API would be fully backwards compatible. There's a few fail-worthy scenarios:

  1. analysis_keys contains keys which aren't delivered by expected_outputs
  2. analysis_keys is defined by analysis_type is missing (already fails in cpg-flow)
  3. analysis type is an iterable, but len != len(analysis_keys)

One or both of these behaviours would preserve all the current stage decorators we have, whilst adding new opt-in behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants