Merged
Conversation
190f274 to
9548ab5
Compare
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pport - Add nonrandom.nf.test: tests R&D and HT modes with fixed UMIs - Add mixed_umis.nf.test: tests mixed fixed/random UMI samples - Update test configs to use nf-core/test-datasets URLs - Use Paired strategy with edits=0 for nonrandom test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous approach parsed the CSV with new File() which silently fails for URL inputs. Now the warning fires from the workflow itself by checking the first item in the correct branch channel, which works regardless of whether the samplesheet is a local path or URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
84097ec to
93fd81c
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies the pipeline fails with an informative error when multiple runs of the same sample specify different umi_file values in the samplesheet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nh13
approved these changes
Mar 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #137
PR Overview
This PR adds an option to use non-random (or fixed) UMIs with fastqourum.
This is supported by an optional new column in the samplesheet which points to a file of known UMI sequences.
This allows the fixed list to be used for some samples but not all of them, or to use different fixed lists for different samples.
Pulled from the updated Usage.md:
The pipeline will also report an error if you supply different UMI lists for the same "sample" (note for #147, this should be based on "library_id").
Test Data Overview
The test data for this was created by taking the existing randomer data for SRR6109255 and replacing the existing 10bp UMI and 1 constant base with an 8bp UMI from the xGen™ cfDNA & FFPE DNA Library Preparation Kit. Then a random 1bp substitution was introduced in ~10% of the UMI sequences.
In the final output test data, we expect to have fewer consensus reads due to the reduced number of available UMIs. However, the drop shouldn't be too drastic.
Running the Test Data
Evaluating the UMI Correction (and Synthetic data creation)
Non-Random Correct UMI Metrics
We observe all the expected UMIs and they all have ~10% representation from reads with 1 mismatch.
Comparing the Grouping
Random
Non-Random
We (as expected) observe larger family sizes in the non-random data because we're using fewer UMIs resulting in more reads sharing coordinates and UMIs.
Comparing the Consensus BAMs
Random
Non-random
~30% reduction in consensus sequences in the non-random data as expected with the larger family sizes.
Comparing Filtered Aligned Consensus BAM
Random
Non-Random
We observe a similar map rate (slightly worse makes sense because we've created consensus across molecules by reducing the number of UMIs available -- more UMI collisions).
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).