-
Notifications
You must be signed in to change notification settings - Fork 0
fix(utils.py): add dependency handling wrapper #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📊 SonarQube Summary
|
🐳 Docker Image BuiltA new Docker image has been built for this PR: Image: Pull command: docker pull australia-southeast1-docker.pkg.dev/cpg-common/images-dev/cpg_flow:cd87c5f4971ab1b82c55eb555de5bd9060c070f8🔗 View in Google Cloud Console This comment was automatically generated by the Docker workflow. |
|
The commit immediately prior to this comment splits the logic into two methods
I don't know if this is better as two separate methods, but Sonar was complaining that there were too many conditional switches in the logic. This split satisfies Sonar, and results in extra test cases focused only on that functionality. |
rameshka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @MattWellie, the new changes look great—thanks for introducing the reusable utility method. I’ve left a few comments for you to review. Also, appreciate the detailed PR description and the additional notes on the logic; they make it easy to follow.
|
Testing the latest state of this branch here https://batch.hail.populationgenomics.org.au/batches/1124653/jobs/1 and here https://batch.hail.populationgenomics.org.au/batches/1124660 |
Useful where inter-job dependencies need to be set, but its possible that any job in the chain may not exist. Fairly lengthy example provided in the docstring. This is a shorthand for the use case:
"my task may have produced a job. If it did, I need to set up a depends_on relationship with a collection of prior jobs... it's possible they also didn't create a job."
There are >100 instances of depends_on relationships being set in code all over our organisation, and many/most of these instances require some floppy logic, as we have to adapt to either side of the relationship being None/empty list: https://github.com/search?q=org%3Apopulationgenomics%20%22depends_on(%22&type=code
When incorporated into a chain of jobs, we're continually given multiple decisions:
This method aims to cleanly cater to all these cases, skipping or dependency setting as required.
A boolean flag determines whether the final action is to extend the 'tail' (previous job list) or not. A tail cannot be extended if it is a single Job, or a None value.
I've added a range of demonstrative test cases, and a chunk of docstring. As a further demonstration/utilisation I've also implemented this in
stage.py. This code block now changes from:to
This provides a good example of how floppy this loop was previously, vs. how tersely it can now be written.
This workflow change has been deployed and confirmed working with https://batch.hail.populationgenomics.org.au/batches/1124257
This PR is loosely paired with the stacked PR chain populationgenomics/test_workflows_shared#19, populationgenomics/test_workflows_shared#20, populationgenomics/test_workflows_shared#21