Dataops 1178 update checkqc for projman by nkongenelly · Pull Request #134 · Molmed/checkQC

nkongenelly · 2025-08-27T14:12:46Z

This version updates the sequencing_metrics returned by the illumina parser to contain all metrics needed by projman_filler to save flowcell_lane and sample metrics

matrulda

Looks good! Left some comments for you.

checkQC/parsers/illumina.py

matrulda · 2025-09-03T06:53:31Z

tests/test_qc_data.py

        "expected_sequencing_metrics": {
            1: {
                "total_cluster_pf": 532_464_327,
+                "pf_clusters": 3_413_232.5,


Hmm I think we should take another look at if these numbers make sense. It differs so much from the total_clusters_pf.

You're absolutely right

setup.py

nkongenelly · 2025-09-03T13:40:26Z

Thanks for the review. I have pushed the updated code and maybe we can have a meeting to discuss the values again since as also mentioned that total_clusters_pf is much bigger than pf_clusters.

matrulda · 2025-09-05T03:54:33Z

.github/workflows/unit_tests.yml

      uses: actions/setup-python@v4
      with:
-        python-version: '3.10'
+        python-version: '3.13'


I think it could be nice to use a test matrix so that we make sure to test all supported versions in the GHA workflow: https://stackoverflow.com/a/61428673

Oh wow!, this is interesting, thank you for this and I have now pushed the updated code with this change.

matrulda · 2025-09-05T04:14:42Z

Thanks for the review. I have pushed the updated code and maybe we can have a meeting to discuss the values again since as also mentioned that total_clusters_pf is much bigger than pf_clusters.

Great work! I left a final comment for you and we definitely should look a meeting to look at the values.

nkongenelly · 2025-09-05T05:31:24Z

Thanks for the review. I have pushed the updated code and maybe we can have a meeting to discuss the values again since as also mentioned that total_clusters_pf is much bigger than pf_clusters.

Great work! I left a final comment for you and we definitely should look a meeting to look at the values.

I have booked a meeting and sent you an invite email for this

matrulda

Nice, but I think you forgot some curly brackets

.github/workflows/unit_tests.yml

Co-authored-by: Matilda Åslin <matilda.aslin@medsci.uu.se>

nkongenelly · 2025-09-05T07:15:22Z

Nice, but I think you forgot some curly brackets

Oh yes! Sorry for that. I think i was in my branch that i didn't see the unit tests fail. I have now updated this.

nkongenelly · 2025-09-25T15:37:55Z

Hej @matrulda , I have now pushed the updated code after we finalized samplesheet v2 structure.

matrulda

Great, looks good! Just a minor comment.

matrulda · 2025-09-29T06:12:09Z

tests/test_qc_data.py

            1: {
-                "total_cluster_pf": 532_464_327,
+                "total_reads_pf": 532_464_327,
+                "total_reads": 638337024,


I think it would be nice to use the "underscore syntax" here as well.

I agree, thanks

nkongenelly · 2025-10-06T05:46:14Z

Hej @matrulda , a kind reminder on this PR so that we can use the master branch when this is merged in projman_filler to install checkqc

…odule

nkongenelly · 2025-10-06T11:24:33Z

I have just pushed some changes to enable projman_filler to;

get flowcell_id (i.e from checkqc interop run_info ) and
also moved some test data inside the checkQC modue so that it can be used by projman_filler tests too ( i tried to include checkqc tests/ test_qc_data.py to be among the files avaliable to import from projman_filler by editing local manifest.in file and setup.py and also using the #egg in the install command but it seemed not to work or i might have implemented them wrongly.....that's why i ended up moving the test_data inside the module folder) ... here is a stackoverflow link where i got the basic information from

nkongenelly · 2025-10-08T13:38:05Z

I also tried to move the qc_data_utils file outside the checkQC folder, and adding it is setup.py (under the package_data list) but i was unsuccessful in importing it in the projman_filler side

And i have now added the OverrideCycles in the bclconvert samplesheet (i had forgotten about it in my previous commit)

matrulda

Great, I had one question.

matrulda · 2025-10-20T10:46:54Z

checkQC/qc_data_utils.py

+import numpy as np
+
+
+def bclconvert_test_runfolder(qc_data):


The only issue I have with this function is that it looks like it is a general utility tool, but it expects qc_data to be based on "200624_A00834_0183_BHMTFYTINY" to match the "expected_*" values.
Could we move this part to the function as well?

qc_data = QCData.from_bclconvert( Path(__file__).parent / "resources/bclconvert/200624_A00834_0183_BHMTFYTINY", parser_config, )

I guess we won't be able to parse the CheckQC test resource when using this function in projman_filler, but in that case maybe the input to this function could be the path to 200624_A00834_0183_BHMTFYTINY? It should state clearly in the doc string that this runfolder is expected as input.
Let me know what you think.

Yes, you're right, this function is only tied to 200624_A00834_0183_BHMTFYTINY. thought of using qc_data to make the function dynamic but then realised that would be comparing 2 qc_data that are generated a fresh each time which may always match.

Yes, i think the function can take runfolder as input

Or maybe check if the flowcell_id for the TINY runfolder is "HMTFYDRXX" because the bclconvert runfolder in projman_filler has a slightly different name i.e 200624_A00834_0184_BHMTFYTINY.

I have now updated the code passing the runfolder Path and checking for the flowcell_id. I have also updated projman_filler to pass the runfolder Path

But this is change is also open for discussion 😃

matrulda · 2025-10-21T12:02:46Z

checkQC/qc_data_utils.py

+def bclconvert_test_runfolder(qc_data, runfolder_path):
+    _, _, run_info = _read_interop_summary(runfolder_path)
+    flowcell_id = run_info.flowcell_id()
+    if "HMTFYDRXX" in flowcell_id:


Great! Just one minor thing, I think it would be nice to throw an exception if the flowcell ID does not match. Explaining that the ouytput of this funtion is adapated for a specific run

Oh yes, thanks

nkongenelly · 2025-10-22T14:16:06Z

Thanks for the reviews, I have now updated the code.

matrulda · 2025-10-23T10:58:01Z

checkQC/qc_data_utils.py

+        raise Exception("Excpected flowcell_id value as 'HMTFYDRXX' only for "
+                        f"this fuction but got {flowcell_id}"


Oh, there is a typo in the exception. Also, I thought the exception could be rephrased to something like this.

Suggested change

raise Exception("Excpected flowcell_id value as 'HMTFYDRXX' only for "

f"this fuction but got {flowcell_id}"

raise Exception("This function is only compatible with the run with flowcell_id: 'HMTFYDRXX', "

f"the supplied runfolder has flowcell_id: {flowcell_id}"

matrulda · 2025-10-23T12:21:39Z

Just one super minor thing, then we can merge this! :)

nkongenelly · 2025-10-23T15:02:48Z

Thanks for the reviews. I have now updated the code

Updating illumina parser to return values for projman

c9bb466

nkongenelly self-assigned this Aug 27, 2025

nkongenelly force-pushed the DATAOPS_1178_update_checkqc_for_projman branch 3 times, most recently from 12fe762 to 09aae6f Compare August 28, 2025 05:59

Corrected tests after adding more details in qcData sequencing_metrics

09aae6f

nkongenelly marked this pull request as ready for review August 28, 2025 08:40

nkongenelly requested a review from matrulda August 28, 2025 08:58

matrulda reviewed Sep 3, 2025

View reviewed changes

nkongenelly added 4 commits September 3, 2025 15:26

Refactored code

c95b2ed

Testing GHA with python 3.11

0ad571c

Testing GHA with python 3.12

ca0a8c5

Testing GHA with python 3.13

a9971fa

matrulda reviewed Sep 5, 2025

View reviewed changes

Using python-versio matrix i GHA workflow

4938558

matrulda reviewed Sep 5, 2025

View reviewed changes

.github/workflows/unit_tests.yml Outdated Show resolved Hide resolved

.github/workflows/unit_tests.yml Outdated Show resolved Hide resolved

nkongenelly and others added 2 commits September 5, 2025 09:12

Update .github/workflows/unit_tests.yml

c1c2744

Co-authored-by: Matilda Åslin <matilda.aslin@medsci.uu.se>

Update .github/workflows/unit_tests.yml

1e18686

Co-authored-by: Matilda Åslin <matilda.aslin@medsci.uu.se>

nkongenelly added 2 commits September 15, 2025 16:11

removed pf_clusters from bclconvert sequencing metrics returned

ba0a7ed

Updated samplesheet v2 structure

5d7a453

matrulda reviewed Sep 29, 2025

View reviewed changes

Updated test data format

394f852

Made run_info available in qc_data and test_runfolders available in m…

e3ae231

…odule

Added OverrideCycles in bclconvert samplesheet

9d971de

matrulda reviewed Oct 20, 2025

View reviewed changes

nkongenelly force-pushed the DATAOPS_1178_update_checkqc_for_projman branch 2 times, most recently from bfde2bb to a946b2a Compare October 20, 2025 14:23

Passing runfolder to qc_data_utils

a946b2a

matrulda reviewed Oct 21, 2025

View reviewed changes

Added exception for bclconvert_test_runfolder

861ea51

matrulda reviewed Oct 23, 2025

View reviewed changes

Refactoring code

18b6e7e

matrulda approved these changes Oct 24, 2025

View reviewed changes

nkongenelly merged commit b7abf00 into Molmed:master Oct 28, 2025
4 checks passed

		raise Exception("Excpected flowcell_id value as 'HMTFYDRXX' only for "
		f"this fuction but got {flowcell_id}"

		import numpy as np


		def bclconvert_test_runfolder(qc_data):

Conversation

nkongenelly commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matrulda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nkongenelly commented Sep 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matrulda commented Sep 5, 2025

Uh oh!

nkongenelly commented Sep 5, 2025

Uh oh!

matrulda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nkongenelly commented Sep 5, 2025

Uh oh!

nkongenelly commented Sep 25, 2025

Uh oh!

matrulda left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkongenelly Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkongenelly commented Oct 6, 2025

Uh oh!

nkongenelly commented Oct 6, 2025

Uh oh!

nkongenelly commented Oct 8, 2025

Uh oh!

matrulda left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkongenelly Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkongenelly Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkongenelly commented Oct 22, 2025

Uh oh!

matrulda Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matrulda commented Oct 23, 2025

Uh oh!

nkongenelly commented Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

nkongenelly commented Aug 27, 2025 •

edited

Loading

nkongenelly Sep 29, 2025 •

edited

Loading

nkongenelly Oct 20, 2025 •

edited

Loading

nkongenelly Oct 22, 2025 •

edited

Loading

matrulda Oct 23, 2025 •

edited

Loading