Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1649 +/- ##
===========================================
- Coverage 99.48% 99.38% -0.10%
===========================================
Files 40 40
Lines 1932 1967 +35
===========================================
+ Hits 1922 1955 +33
- Misses 10 12 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fevac
left a comment
There was a problem hiding this comment.
Oh sorry! I accidentally started to review this thinking it was the other PR. So I won't continue for now
beatrizsavinhas
left a comment
There was a problem hiding this comment.
I unfortunately don't fully understand the context around generating these files from reading the PR/user story description, so I can't really comment on logic or parameters used...
But I added some comments on the code and suggestions for better readability!
If we meet to discuss the PR, maybe I can give a more complete and meaningful review! It would also be important to look at the test results.
| def open_output(path: str | Path) -> ContextManager[TextIO]: | ||
| """Open output file; '-' means stdout (not closed).""" | ||
| p = str(path) | ||
| if p == "-": | ||
| return nullcontext(sys.stdout) | ||
| return open(p, "w", encoding="utf-8") |
There was a problem hiding this comment.
Why is this function necessary? 🤔
I believe you can just call open(file_path, "w", encoding="utf-8") even if the file is already open and the the path is of type Path.
There was a problem hiding this comment.
It was to handle the possibility of not providing an output path but just piping output to standard out. But I don't think it's necessary since this is a script to be used in a pipeline with a predictable input and output structure! We don't need flexibility, so I'll remove! 🙏
| vcf_path: str | Path, | ||
| bedgraph_path: str | Path, | ||
| track_name: Optional[str] = None, | ||
| ) -> int: | ||
| """Convert a VCF into a bedGraph of AF computed from AD/DP.""" | ||
| n_written = 0 | ||
| with closing(VCF(str(vcf_path))) as vcf, open_output(bedgraph_path) as fout: |
There was a problem hiding this comment.
The cli function will call this with vcf_path and bedgraph_path as Path, right? Then there is no need to have ambiguous types here.
See also comment above on open_output.
| vcf_path: str | Path, | |
| bedgraph_path: str | Path, | |
| track_name: Optional[str] = None, | |
| ) -> int: | |
| """Convert a VCF into a bedGraph of AF computed from AD/DP.""" | |
| n_written = 0 | |
| with closing(VCF(str(vcf_path))) as vcf, open_output(bedgraph_path) as fout: | |
| vcf_path: Path, | |
| bedgraph_path: Path, | |
| track_name: Optional[str] = None, | |
| ) -> int: | |
| """Convert a VCF into a bedGraph of AF computed from AD/DP.""" | |
| n_written = 0 | |
| with closing(VCF(vcf_path.as_posix()) as vcf, open_output(bedgraph_path) as fout: |
There was a problem hiding this comment.
You're right! It should only be string. I've been confused about this so many times, but I think the type=click.Path in the argument is only used to validate the input, but what you actually get out is a string. I'll change the types to string
| record = variant_to_record(variant) | ||
| if record == None: | ||
| continue | ||
| fout.write(record) | ||
| n_written += 1 |
There was a problem hiding this comment.
See comment above on raising an exception for variant_to_record. It is also possible to print out a warning for the exception if that would be of any use.
| record = variant_to_record(variant) | |
| if record == None: | |
| continue | |
| fout.write(record) | |
| n_written += 1 | |
| try: | |
| record = variant_to_record(variant) | |
| fout.write(record) | |
| n_written += 1 | |
| except: | |
| continue |
There was a problem hiding this comment.
The variants into this script are a bit unconventional in that they are forced calls on common gnomad population variants, and only used for visualising the allele frequencies in IGV (you can look at the linked issue to find a screenshot). Maybe it could be interesting to log however how many total variants were skipped, I'll see if I can add that!
| plot_tumor = vcf_dir + "CNV.somatic." + config["analysis"]["case_id"] + ".ascat.tumor.png", | ||
| plot_germline = vcf_dir + "CNV.somatic." + config["analysis"]["case_id"] + ".ascat.germline.png", | ||
| plot_sunrise = vcf_dir + "CNV.somatic." + config["analysis"]["case_id"] + ".ascat.sunrise.png", | ||
| namemap = vcf_dir + "CNV.somatic." + config["analysis"]["case_id"] + ".ascat.sample_name_map", |
There was a problem hiding this comment.
Is this change necessary for this PR?
There was a problem hiding this comment.
Nope! Don't even know how it happened 😂 either way it doesn't matter. But I'll return the comma.
| ], | ||
| "varcall": [ | ||
| "snakemake_rules/variant_calling/germline_wgs.rule", | ||
| "snakemake_rules/variant_calling/igv_files.rule", |
There was a problem hiding this comment.
The User story mentions only "tumor normal matched WGS analyses". Is this to be applied for TO too?
There was a problem hiding this comment.
Yes! I'll update the userstory, I think the specific case Teresita referred to was tumor+normal, but it should be applicable to tumor only as well
There was a problem hiding this comment.
The name of the test still refers to tga and missing_gens. If it was your intention to change the whole test, could you update the test name too?
Also, what is the reasoning in adding the panel_bed_file parameter now? 🤔
There was a problem hiding this comment.
I don't remember what the issue was with the test. But it failed, and when I looked at it I saw that it wasn't configured correctly. It was supposed to test running tga workflow without GENS input arguments, but it wasn't provided a panel-bed-file which is what configures the workflow as tga. So I just cleaned it up in passing.
Out of context but a small fix
There was a problem hiding this comment.
Same general advice as above: set explanatory names for variables and avoid ambiguous function returns.
| if bins_per_window <= 0: | ||
| raise click.ClickException("--bin-size must be a positive integer") |
There was a problem hiding this comment.
An alternative with automatic error printing for this is using type=click.IntRange(0, <max>)).
There was a problem hiding this comment.
Cool! It would feel sort of arbitrary to put a maximum value here though, and I'm fine with the current solution
|



Description
Add BAF and Log2 files from GATK to be stored in housekeeper and delivered to Caesar. See linked issue.
Added binning of 100-bp windows from GATK collect read-counts to reduce computational load when viewing in IGV (merging 5 bins by default)
Added
Documentation
Tests
Feature Tests
Pipeline Integrity Tests
.hkfile)Clinical Genomics Stockholm
Documentation
Panel of Normal specific criteria
User Changes
Infrastructure Changes
Validation criteria
Validation criteria to be added to validation report PR: [LINK-TO-VALIDATION-REPORT-PR from the validations repository]
Version specific criteria
Important
One of the below checkboxes for validation need to be checked
Checklist
Important
Ensure that all checkboxes below are ticked before merging.
For Developers
For Reviewers
conditions where applicable, with satisfactory results.