Dataops 1178 update checkqc for projman#134
Conversation
12fe762 to
09aae6f
Compare
matrulda
left a comment
There was a problem hiding this comment.
Looks good! Left some comments for you.
tests/test_qc_data.py
Outdated
| "expected_sequencing_metrics": { | ||
| 1: { | ||
| "total_cluster_pf": 532_464_327, | ||
| "pf_clusters": 3_413_232.5, |
There was a problem hiding this comment.
Hmm I think we should take another look at if these numbers make sense. It differs so much from the total_clusters_pf.
There was a problem hiding this comment.
You're absolutely right
|
Thanks for the review. I have pushed the updated code and maybe we can have a meeting to discuss the values again since as also mentioned that total_clusters_pf is much bigger than pf_clusters. |
.github/workflows/unit_tests.yml
Outdated
| uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: '3.10' | ||
| python-version: '3.13' |
There was a problem hiding this comment.
I think it could be nice to use a test matrix so that we make sure to test all supported versions in the GHA workflow: https://stackoverflow.com/a/61428673
There was a problem hiding this comment.
Oh wow!, this is interesting, thank you for this and I have now pushed the updated code with this change.
Great work! I left a final comment for you and we definitely should look a meeting to look at the values. |
I have booked a meeting and sent you an invite email for this |
matrulda
left a comment
There was a problem hiding this comment.
Nice, but I think you forgot some curly brackets
Co-authored-by: Matilda Åslin <matilda.aslin@medsci.uu.se>
Co-authored-by: Matilda Åslin <matilda.aslin@medsci.uu.se>
Oh yes! Sorry for that. I think i was in my branch that i didn't see the unit tests fail. I have now updated this. |
|
Hej @matrulda , I have now pushed the updated code after we finalized samplesheet v2 structure. |
matrulda
left a comment
There was a problem hiding this comment.
Great, looks good! Just a minor comment.
tests/test_qc_data.py
Outdated
| 1: { | ||
| "total_cluster_pf": 532_464_327, | ||
| "total_reads_pf": 532_464_327, | ||
| "total_reads": 638337024, |
There was a problem hiding this comment.
I think it would be nice to use the "underscore syntax" here as well.
There was a problem hiding this comment.
I agree, thanks
|
Hej @matrulda , a kind reminder on this PR so that we can use the master branch when this is merged in projman_filler to install checkqc |
|
I have just pushed some changes to enable projman_filler to;
|
|
I also tried to move the qc_data_utils file outside the checkQC folder, and adding it is setup.py (under the package_data list) but i was unsuccessful in importing it in the projman_filler side And i have now added the OverrideCycles in the bclconvert samplesheet (i had forgotten about it in my previous commit) |
matrulda
left a comment
There was a problem hiding this comment.
Great, I had one question.
checkQC/qc_data_utils.py
Outdated
| import numpy as np | ||
|
|
||
|
|
||
| def bclconvert_test_runfolder(qc_data): |
There was a problem hiding this comment.
The only issue I have with this function is that it looks like it is a general utility tool, but it expects qc_data to be based on "200624_A00834_0183_BHMTFYTINY" to match the "expected_*" values.
Could we move this part to the function as well?
qc_data = QCData.from_bclconvert(
Path(__file__).parent / "resources/bclconvert/200624_A00834_0183_BHMTFYTINY",
parser_config,
)
I guess we won't be able to parse the CheckQC test resource when using this function in projman_filler, but in that case maybe the input to this function could be the path to 200624_A00834_0183_BHMTFYTINY? It should state clearly in the doc string that this runfolder is expected as input.
Let me know what you think.
There was a problem hiding this comment.
Yes, you're right, this function is only tied to 200624_A00834_0183_BHMTFYTINY. thought of using qc_data to make the function dynamic but then realised that would be comparing 2 qc_data that are generated a fresh each time which may always match.
Yes, i think the function can take runfolder as input
There was a problem hiding this comment.
Or maybe check if the flowcell_id for the TINY runfolder is "HMTFYDRXX" because the bclconvert runfolder in projman_filler has a slightly different name i.e 200624_A00834_0184_BHMTFYTINY.
There was a problem hiding this comment.
I have now updated the code passing the runfolder Path and checking for the flowcell_id. I have also updated projman_filler to pass the runfolder Path
But this is change is also open for discussion 😃
bfde2bb to
a946b2a
Compare
| def bclconvert_test_runfolder(qc_data, runfolder_path): | ||
| _, _, run_info = _read_interop_summary(runfolder_path) | ||
| flowcell_id = run_info.flowcell_id() | ||
| if "HMTFYDRXX" in flowcell_id: |
There was a problem hiding this comment.
Great! Just one minor thing, I think it would be nice to throw an exception if the flowcell ID does not match. Explaining that the ouytput of this funtion is adapated for a specific run
There was a problem hiding this comment.
Oh yes, thanks
|
Thanks for the reviews, I have now updated the code. |
checkQC/qc_data_utils.py
Outdated
| raise Exception("Excpected flowcell_id value as 'HMTFYDRXX' only for " | ||
| f"this fuction but got {flowcell_id}" |
There was a problem hiding this comment.
Oh, there is a typo in the exception. Also, I thought the exception could be rephrased to something like this.
| raise Exception("Excpected flowcell_id value as 'HMTFYDRXX' only for " | |
| f"this fuction but got {flowcell_id}" | |
| raise Exception("This function is only compatible with the run with flowcell_id: 'HMTFYDRXX', " | |
| f"the supplied runfolder has flowcell_id: {flowcell_id}" |
|
Just one super minor thing, then we can merge this! :) |
|
Thanks for the reviews. I have now updated the code |
Jira ticket
This version updates the sequencing_metrics returned by the illumina parser to contain all metrics needed by projman_filler to save flowcell_lane and sample metrics