Skip to content

Metbat outliers#5993

Draft
dnil wants to merge 45 commits intomainfrom
metbat_outliers
Draft

Metbat outliers#5993
dnil wants to merge 45 commits intomainfrom
metbat_outliers

Conversation

@dnil
Copy link
Member

@dnil dnil commented Jan 27, 2026

This PR adds a functionality or fixes a bug.

Screenshot 2026-01-29 at 15 49 24 Screenshot 2026-01-29 at 17 39 43

For a start, this PR presumes we will have sample_id as a column in https://github.com/Clinical-Genomics/MTP-NALLO/issues/88. It is intended to be agnostic of https://github.com/Clinical-Genomics/MTP-NALLO/issues/89, at least with regard to HGNC id being present as a separate column, or as part of the cpg_label - which is the current state.

Testing on cg-vm1 server (Clinical Genomics Stockholm)

Prepare for testing

  1. Make sure the PR is pushed and available on Docker Hub
  2. First book your testing time using the Pax software available at https://pax.scilifelab.se/. The resource you are going to call dibs on is scout-stage and the server is cg-vm1.
  3. ssh <USER.NAME>@cg-vm1.scilifelab.se
  4. sudo -iu hiseq.clinical
  5. ssh localhost
  6. (optional) Find out which scout branch is currently deployed on cg-vm1: podman ps
  7. Stop the service with current deployed branch: systemctl --user stop scout@<name_of_currently_deployed_branch>
  8. Start the scout service with the branch to test: systemctl --user start scout@<this_branch>
  9. Make sure the branch is deployed: systemctl --user status scout.target
  10. After testing is done, repeat procedure at https://pax.scilifelab.se/, which will release the allocated resource (scout-stage) to be used for testing by other users.
Testing on hasta server (Clinical Genomics Stockholm)

Prepare for testing

  1. ssh <USER.NAME>@hasta.scilifelab.se
  2. Book your testing time using the Pax software. us; paxa -u <user> -s hasta -r scout-stage. You can also use the WSGI Pax app available at https://pax.scilifelab.se/.
  3. (optional) Find out which scout branch is currently deployed on cg-vm1: conda activate S_scout; pip freeze | grep scout-browser
  4. Deploy the branch to test: bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b <this_branch>
  5. Make sure the branch is deployed: us; scout --version
  6. After testing is done, repeat the paxa procedure, which will release the allocated resource (scout-stage) to be used for testing by other users.

How to test:

  1. how to test it, possibly with real cases/data

Expected outcome:
The functionality should be working
Take a screenshot and attach or copy/paste the output.

Review:

  • code approved by
  • tests executed by

@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 96.22642% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.94%. Comparing base (a49902c) to head (fde5817).

Files with missing lines Patch % Lines
...cout/server/blueprints/alignviewers/controllers.py 75.00% 1 Missing ⚠️
scout/server/blueprints/omics_variants/views.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5993      +/-   ##
==========================================
+ Coverage   83.88%   83.94%   +0.05%     
==========================================
  Files         336      336              
  Lines       21022    21069      +47     
==========================================
+ Hits        17635    17686      +51     
+ Misses       3387     3383       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines 227 to 230
elif "cpg_label" in values:
cpg_label = values.get("cpg_label").split("_")
values["hgncSymbol"] = [cpg_label[1]]
values["hgncId"] = [cpg_label[2].split(":")[1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it might have been a good thing to do this in the pipeline.

Especially if we would like to be able to filter on clinical/research HGNCs before upload, or run with a GoS background file that might have e.g. CpG:_104 as cpg_label.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! I got the impression from Ine that was as far as you got? It would also be prudent to have the sample ID in the file, not just in the filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is indeed the current state. We can add it to the Nallo TODO.

Copy link
Contributor

@fellen31 fellen31 Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be prudent to have the sample ID in the file, not just in the filename.

Are you happy to have these files per sample, or would you rather have them per case if we add sample ID as a column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are ok. We would be very used to joint case VCF files, but many of these "extras" are provided on a per sample basis anyway. I guess conceptually I would like more of a per-sample single analysis "outcome" column. I have a feeling if we want to pass more of the values for full cases, we will be straining the tsv format and perhaps better go to a VCF.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DROP files went single sample per line. That is one way to avoid having a "FORMAT" type column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, and yes, they are per-case files, as are the other variant files, so it would be convenient.

somalier_ancestry: Optional[str] = None
somalier_pairs: Optional[str] = None
somalier_samples: Optional[str] = None
exe_ver: Optional[str] = None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just duplicated, likely from an older sort. 🤷‍♂️

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add methbat methylation region calls as outliers

3 participants