Allow writing multiple files into a single analysis record #126

EddieLF · 2025-09-04T00:41:58Z

Re-implementation of populationgenomics/production-pipelines#1261

github-actions · 2025-09-04T00:46:20Z

🐳 Docker Image Built

A new Docker image has been built for this PR:

Image: australia-southeast1-docker.pkg.dev/cpg-common/images-dev/cpg_flow:fb5b9c981a5530bd11d6af8b9da0dc5b1e2d6862

Pull command:

docker pull australia-southeast1-docker.pkg.dev/cpg-common/images-dev/cpg_flow:fb5b9c981a5530bd11d6af8b9da0dc5b1e2d6862

🔗 View in Google Cloud Console

This comment was automatically generated by the Docker workflow.

MattWellie · 2025-09-09T04:19:05Z

I haven't reviewed this in full, but I think this can be implemented with a backwards-compatible interface between stages and CPG-Flow. The currently supported values for analysis_keys is list[str | Path] | None Given a random example of a stage with three outputs; two I'd like to combine into a single analysis entry, and a third which could be separate, I pictured this as moving from:

@stage.stage(
    analysis_keys=['output_1', 'output_2', 'output_3'],
    required_stages=[AStage],
    analysis_type='type_1',
)

to

@stage.stage(
    analysis_keys=[('output_1', 'output_2'), 'output_3'],
    required_stages=[AStage],
    analysis_type='type_1',
)

where the expected analysis entry would be

1: 'primary' output output_1, secondary output output_2, analysis_type type_1
- this could be extended with as many secondary files as you like, because each analysis entry has one main output, and an associated number of secondaries
2: output_3 only, analysis_type type_1

We could also extend this to support multiple analysis types at the same time, where analysis_type could either be a string (current), i.e. all outputs are given the same type.

This could be moved to a str | iterable[str], e.g.

@stage.stage(
    analysis_keys=['output_1', 'output_2', 'output_3'],
    required_stages=[AStage],
    analysis_type='type_1',
)

to

@stage.stage(
    analysis_keys=[('output_1', 'output_2'), 'output_3'],
    required_stages=[AStage],
    analysis_type=['type_1', 'type_2'],
)

Which would be read as two analysis entries:

1: 'primary' output: output_1, secondary output: output_2, analysis_type type_1
2: output_3 only, analysis_type type_2

This API would be fully backwards compatible. There's a few fail-worthy scenarios:

analysis_keys contains keys which aren't delivered by expected_outputs
analysis_keys is defined by analysis_type is missing (already fails in cpg-flow)
analysis type is an iterable, but len != len(analysis_keys)

One or both of these behaviours would preserve all the current stage decorators we have, whilst adding new opt-in behaviour.

Allow writing multiple files into a single analysis record

fb5b9c9

EddieLF requested a review from a team as a code owner September 4, 2025 00:41

EddieLF temporarily deployed to production September 4, 2025 00:42 — with GitHub Actions Inactive

EddieLF mentioned this pull request Dec 19, 2025

Suggested improvements for Metamist analysis IO #146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow writing multiple files into a single analysis record #126

Allow writing multiple files into a single analysis record #126

Uh oh!

EddieLF commented Sep 4, 2025

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

MattWellie commented Sep 9, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Allow writing multiple files into a single analysis record #126

Are you sure you want to change the base?

Allow writing multiple files into a single analysis record #126

Uh oh!

Conversation

EddieLF commented Sep 4, 2025

Uh oh!

github-actions bot commented Sep 4, 2025

🐳 Docker Image Built

Uh oh!

MattWellie commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MattWellie commented Sep 9, 2025 •

edited

Loading