feat: add igv cnv files for WGS by mathiasbio · Pull Request #1649 · Clinical-Genomics/BALSAMIC

mathiasbio · 2026-01-28T14:22:32Z

Description

Add BAF and Log2 files from GATK to be stored in housekeeper and delivered to Caesar. See linked issue.

Added binning of 100-bp windows from GATK collect read-counts to reduce computational load when viewing in IGV (merging 5 bins by default)

Added

gnomad af 5 baf bedgraph for viewing in IGV
denoised gatk readcounts log2 bedgraph for viewing in IGV

Documentation

N/A
Updated Balsamic documentation to reflect the changes as needed for this PR.
- docs/balsamic_sv_cnv.rst

Tests

Feature Tests

Test: WGS TN case finished successfully and produces the new files
Test: New files from WGS TN case can be stored in housekeeper (with associated hermes update) with the correct delivery tags:
- [Screenshot]

Pipeline Integrity Tests

Report deliver (generation of the .hk file)
- N/A
- Verified
TGA T/O Workflow
- N/A
- Verified
TGA T/N Workflow
- N/A
- Verified
UMI T/O Workflow
- N/A
- Verified
UMI T/N Workflow
- N/A
- Verified
WGS T/O Workflow
- N/A
- Verified
WGS T/N Workflow
- N/A
- Verified
QC Workflow
- N/A
- Verified
PON Workflow
- N/A
- Verified

Clinical Genomics Stockholm

Documentation

Atlas documentation
- N/A
- Updated: [Link]
Web portal for Clinical Genomics
- N/A
- Updated: [Link]

Panel of Normal specific criteria

The PR includes the addition of a new Panel of Normals
The samples have been verified to adhere to the sample selection criteria on Atlas PoN creation instructions for Balsamic

User Changes

N/A
This PR affects the output files or results.
- User feedback is considered unnecessary because [Justification].
- Affected users have been included in the development process and given a chance to provide feedback.

Infrastructure Changes

Stored files in Housekeeper
- N/A
- Updated: [Link]
CG (CLI and delivered/uploaded files)
- N/A
- Updated: [Link]
Servers (configuration files on Hasta)
- N/A
- Updated: [Link]
Scout interface
- N/A
- Updated: [Link]

Validation criteria

Validation criteria to be added to validation report PR: [LINK-TO-VALIDATION-REPORT-PR from the validations repository]

Version specific criteria

Text here or N/A

Important

One of the below checkboxes for validation need to be checked

Added version specific validation criteria to validation report
Changes validated in standard sections: [validation-section]
Validation criteria not necessary

Checklist

Important

Ensure that all checkboxes below are ticked before merging.

For Developers

PR Description
- Provided a comprehensive description of the PR.
- Linked relevant user stories or issues to the PR.
Documentation
- Verified and updated documentation if necessary.
Validation criteria
- Completed the validation criteria section of the template.
Tests
- Described and tested the functionality addressed in the PR.
- Ensured integration of the new code with existing workflows.
- Confirmed that meaningful unit tests were added for the changes introduced.
- Checked that the PR has successfully passed all relevant code smells and coverage checks.
Review
- Addressed and resolved all the feedback provided during the code review process.
- Obtained final approval from designated reviewers.

For Reviewers

Code
- Code implements the intended features or fixes the reported issue.
- Code follows the project's coding standards and style guide.
Documentation
- Pipeline changes are well-documented in the CHANGELOG and relevant documentation.
Validation criteria
- The author has completed the validation criteria section of the template
Tests
- The author provided a description of their manual testing, including consideration of edge cases and boundary
  conditions where applicable, with satisfactory results.
Review
- Confirmed that the developer has addressed all the comments during the code review.

…develop

codecov · 2026-01-28T14:28:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.38%. Comparing base (7d529e6) to head (32d9c88).
⚠️ Report is 154 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1649      +/-   ##
===========================================
- Coverage    99.48%   99.38%   -0.10%     
===========================================
  Files           40       40              
  Lines         1932     1967      +35     
===========================================
+ Hits          1922     1955      +33     
- Misses          10       12       +2

Flag	Coverage Δ
unittests	`99.38% <100.00%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fevac

Oh sorry! I accidentally started to review this thinking it was the other PR. So I won't continue for now

BALSAMIC/assets/scripts/igv_baf_bedgraph.py

beatrizsavinhas

I unfortunately don't fully understand the context around generating these files from reading the PR/user story description, so I can't really comment on logic or parameters used...
But I added some comments on the code and suggestions for better readability!

If we meet to discuss the PR, maybe I can give a more complete and meaningful review! It would also be important to look at the test results.

beatrizsavinhas · 2026-02-11T08:23:43Z

BALSAMIC/assets/scripts/igv_baf_bedgraph.py

+def open_output(path: str | Path) -> ContextManager[TextIO]:
+    """Open output file; '-' means stdout (not closed)."""
+    p = str(path)
+    if p == "-":
+        return nullcontext(sys.stdout)
+    return open(p, "w", encoding="utf-8")


Why is this function necessary? 🤔
I believe you can just call open(file_path, "w", encoding="utf-8") even if the file is already open and the the path is of type Path.

It was to handle the possibility of not providing an output path but just piping output to standard out. But I don't think it's necessary since this is a script to be used in a pipeline with a predictable input and output structure! We don't need flexibility, so I'll remove! 🙏

beatrizsavinhas · 2026-02-11T08:37:35Z

BALSAMIC/assets/scripts/igv_baf_bedgraph.py

+    vcf_path: str | Path,
+    bedgraph_path: str | Path,
+    track_name: Optional[str] = None,
+) -> int:
+    """Convert a VCF into a bedGraph of AF computed from AD/DP."""
+    n_written = 0
+    with closing(VCF(str(vcf_path))) as vcf, open_output(bedgraph_path) as fout:


The cli function will call this with vcf_path and bedgraph_path as Path, right? Then there is no need to have ambiguous types here.
See also comment above on open_output.

Suggested change

vcf_path: str | Path,

bedgraph_path: str | Path,

track_name: Optional[str] = None,

) -> int:

"""Convert a VCF into a bedGraph of AF computed from AD/DP."""

n_written = 0

with closing(VCF(str(vcf_path))) as vcf, open_output(bedgraph_path) as fout:

vcf_path: Path,

bedgraph_path: Path,

track_name: Optional[str] = None,

) -> int:

"""Convert a VCF into a bedGraph of AF computed from AD/DP."""

n_written = 0

with closing(VCF(vcf_path.as_posix()) as vcf, open_output(bedgraph_path) as fout:

You're right! It should only be string. I've been confused about this so many times, but I think the type=click.Path in the argument is only used to validate the input, but what you actually get out is a string. I'll change the types to string

beatrizsavinhas · 2026-02-11T08:41:04Z

BALSAMIC/assets/scripts/igv_baf_bedgraph.py

+            record = variant_to_record(variant)
+            if record == None:
+                continue
+            fout.write(record)
+            n_written += 1


See comment above on raising an exception for variant_to_record. It is also possible to print out a warning for the exception if that would be of any use.

Suggested change

record = variant_to_record(variant)

if record == None:

continue

fout.write(record)

n_written += 1

try:

record = variant_to_record(variant)

fout.write(record)

n_written += 1

except:

continue

The variants into this script are a bit unconventional in that they are forced calls on common gnomad population variants, and only used for visualising the allele frequencies in IGV (you can look at the linked issue to find a screenshot). Maybe it could be interesting to log however how many total variants were skipped, I'll see if I can add that!

tests/scripts/test_igv_baf_bedgraph.py

beatrizsavinhas · 2026-02-11T09:57:31Z

BALSAMIC/snakemake_rules/variant_calling/somatic_sv_tumor_normal_wgs.rule

        plot_tumor = vcf_dir + "CNV.somatic." + config["analysis"]["case_id"] + ".ascat.tumor.png",
        plot_germline = vcf_dir + "CNV.somatic." + config["analysis"]["case_id"] + ".ascat.germline.png",
        plot_sunrise = vcf_dir + "CNV.somatic." + config["analysis"]["case_id"] + ".ascat.sunrise.png",
-        namemap = vcf_dir + "CNV.somatic." + config["analysis"]["case_id"] + ".ascat.sample_name_map",


Is this change necessary for this PR?

Nope! Don't even know how it happened 😂 either way it doesn't matter. But I'll return the comma.

beatrizsavinhas · 2026-02-11T10:02:57Z

BALSAMIC/constants/rules.py

        ],
        "varcall": [
            "snakemake_rules/variant_calling/germline_wgs.rule",
+            "snakemake_rules/variant_calling/igv_files.rule",


The User story mentions only "tumor normal matched WGS analyses". Is this to be applied for TO too?

Yes! I'll update the userstory, I think the specific case Teresita referred to was tumor+normal, but it should be applicable to tumor only as well

beatrizsavinhas · 2026-02-11T14:53:24Z

tests/commands/config/test_config_sample.py

The name of the test still refers to tga and missing_gens. If it was your intention to change the whole test, could you update the test name too?

Also, what is the reasoning in adding the panel_bed_file parameter now? 🤔

I don't remember what the issue was with the test. But it failed, and when I looked at it I saw that it wasn't configured correctly. It was supposed to test running tga workflow without GENS input arguments, but it wasn't provided a panel-bed-file which is what configures the workflow as tga. So I just cleaned it up in passing.

Out of context but a small fix

beatrizsavinhas · 2026-02-11T16:01:06Z

BALSAMIC/assets/scripts/igv_cnr_binning_bedgraph.py

Same general advice as above: set explanatory names for variables and avoid ambiguous function returns.

beatrizsavinhas · 2026-02-11T16:05:40Z

BALSAMIC/assets/scripts/igv_cnr_binning_bedgraph.py

+    if bins_per_window <= 0:
+        raise click.ClickException("--bin-size must be a positive integer")


An alternative with automatic error printing for this is using type=click.IntRange(0, <max>)).

Cool! It would feel sort of arbitrary to put a maximum value here though, and I'm fine with the current solution

…develop

sonarqubecloud · 2026-02-16T11:15:17Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

mathiasbio added 7 commits November 10, 2025 11:16

add tmpdir

a60c542

Merge branch 'develop' of github.com:Clinical-Genomics/BALSAMIC into …

b99b4b5

…develop

Merge branch 'develop' into add_igv_cnv_files

c35fb10

Merge branch 'develop' of github.com:Clinical-Genomics/BALSAMIC into …

8ac0536

…develop

Merge branch 'develop' into add_igv_cnv_files

bd5857e

add scripts and rule

4304fad

black

f6b3082

mathiasbio self-assigned this Jan 28, 2026

mathiasbio linked an issue Jan 28, 2026 that may be closed by this pull request

[User Story] Delivery of additional CNV and BAF files to Caesar #1493

Open

3 tasks

mathiasbio changed the base branch from master to develop January 28, 2026 14:29

mathiasbio added 14 commits January 28, 2026 15:32

correct message

45b16e5

refactor for code complexity

8bfda01

clean up

7d03eca

add to rule all

68ba2f7

black

f3b1f30

changelog

b403b17

refactor

55a717e

fix

6a3c4b1

add pytests

005d287

black

bc57eb2

sonarcloud

0971b09

fix

7fd1a67

bugfix

b9730f8

fix pytests

5ee9689

fevac reviewed Feb 4, 2026

View reviewed changes

BALSAMIC/assets/scripts/igv_baf_bedgraph.py Outdated Show resolved Hide resolved

BALSAMIC/assets/scripts/igv_baf_bedgraph.py Show resolved Hide resolved

mathiasbio marked this pull request as ready for review February 9, 2026 11:55

mathiasbio requested a review from a team as a code owner February 9, 2026 11:55

mathiasbio added 2 commits February 9, 2026 13:42

change to using cyvcf

d55f46c

fix except pass

6cbd596

mathiasbio added 3 commits February 9, 2026 13:55

remove standard out

cf32a1d

simplify

00e0fee

fix script

2262f13

beatrizsavinhas reviewed Feb 11, 2026

View reviewed changes

mathiasbio added 8 commits February 16, 2026 11:01

code review

d6926a9

update tests

05b02d1

code review fix tests

c645ce9

comma

4324d29

fix

3c37489

Merge branch 'develop' of github.com:Clinical-Genomics/BALSAMIC into …

59fecf6

…develop

merge conflict changelog

491b339

docs

32d9c88

		if bins_per_window <= 0:
		raise click.ClickException("--bin-size must be a positive integer")

Conversation

mathiasbio commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Added

Documentation

Tests

Feature Tests

Pipeline Integrity Tests

Clinical Genomics Stockholm

Documentation

Panel of Normal specific criteria

User Changes

Infrastructure Changes

Validation criteria

Checklist

For Developers

For Reviewers

Uh oh!

codecov bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fevac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

beatrizsavinhas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mathiasbio Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Feb 16, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mathiasbio commented Jan 28, 2026 •

edited

Loading

codecov bot commented Jan 28, 2026 •

edited

Loading

mathiasbio Feb 16, 2026 •

edited

Loading