v2.0.0: Refactor for sample-wise parameterisation by nschan · Pull Request #171 · nf-core/genomeassembler

nschan · 2025-07-09T13:27:51Z

Updated Feb 04 2026

As suggested here this is full refactor of genomeassembler to support sample-level parameterisation of everything.

Why?
Often when doing genome assembly, we do not know what works best. With this change, this pipeline can be used to compare different settings for the same set of reads, to compare the assembly outcome. Samples that share the same value in group will be combined during reporting and preprocessing to facilitate comparisons of strategies on the same input(s).

Per-sample parameterisation

Initially, I implemented this poorly, based largely around join()'ing various maps back to other maps. nextflow does not have a joinoperator for maps, so this was a big mess and turned out to be hard to read, annoying to write, and constantly blocking. To summarise: bad idea, cannot recommend. This attempt contributes to the large number of commits in this branch.

While contemplating my failure to implement sample-level parameterisation, I realised that the solution to this problem is "meta-stuffing", also referred to as "meta-smuggling" by @prototaxites, who seems to have arrived at a similar conclusion at around the same time.
In the purest form, this would implement everything in a single meta-map, which is in slot 0 of the channel traveling through the pipeline. I will use meta to refer to [0] of list-channels from here on.
Note: A pure meta-stuffing implementation would have required additional refactoring of some subworkflows, in particular of QC which takes more than one input channel, something I did not want to do.

How this works:
Everything goes into meta: params are turned into k/v pairs for each sample, unless the samplesheet contains a different value for that sample with the same key, resulting in a massive meta-map. values are pulled from meta as required for channel inputs, meta is recreated / updated from channel outputs. This largely enables flow-control at the sample level, when combined with branch, filter and related operators. In some cases, joins cannot be avoided (or would need to be traded for concurrent processes), but I tried to minimise them.
This means that the pipeline is essentially free of if { } statements, since flow control is done via channels. This also means that the pipeline DAG is always rendered in full, irrespective of whether the nodes will actually be visited by a sample. It might be possible to optimise DAG rendering by inspecting all created meta-maps and creating some global variables based on their content, to again conditionally include subworkflows, but this is beyond the scope of this refactor.

I made an effort to provide flexibility in combining assemblers, polishers, or scaffolding tools, as I thought it was reasonable, but this does not offer full-factorial combinations, which become especially tricky if things should happen in order.

Generally, I am very happy with this approach, it offers great flexibility, and is surprisingly nice to write once meta has been constructed. Given that global parameters can still be set, the samplesheet may or may not be kind of wide, for a single sample everything could be done via params and the only column in the samplesheet would be sample. In the most ridiculous case of setting all parameters differently for each sample, the samplesheet could grow to around 50 columns.

Grouping

Grouping is implemented as an extension of "meta-smuggling", essentially smuggling multiple meta maps through a channel (inside meta), replacing the value of meta.id with the group id.
Here is the relevant code for grouping and un-grouping while maintaining meta-maps:

    some_channel
         // Move group information into channel, if it exists
        .filter { it -> it.meta.group }
        .map { it -> [it.meta, it.meta.group, it.meta.ontreads] }
        // Group by group
        .groupTuple(by: 1)
        // Collect all sample-meta into a group meta slot named metas
        // Use unique reads; user responsible to group correctly
        .map {
            it ->
                [
                    [
                        id: it[1], // the group
                        metas: it[0]
                    ],
                    it[2].unique()[0] // Ontreads
                ]
        }

After this input channel has been processed, the samples are
recreated from meta[metas]:

    process.OUT
        // Take samples with metas in slot [0]
        .filter { it -> it[0].metas }
        .flatMap { it ->
            // $it looks like [meta, output_path]
            // recreate meta from metas and update path.
            it[0].metas
                  .collect { meta -> [
                                meta: meta - meta.subMap("ontreads") + [ontreads: it[1]]
                                ]
                            }
        }
        .mix(
            process.OUT
                .filter { it -> !it[0].metas }
        )

More

Since switching to meta-stuffing made things much easier, I have also added an HiC scaffolding subworkflow, and support for dorado polish (experimental, as dorado polish does not work reliably) in the polishing subworkflow

nf-core-bot · 2025-07-09T13:28:25Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

This reverts commit 376973f.

nschan · 2025-09-04T14:23:02Z

This comment is outdated, and this information is now also in the PR text.

Since the initial start of this refactor, I noticed that when doing multiple assemblies from the same set of reads it is kind of a waste to send those reads through preprocessing multiple times. I also figured that assemblies from the same set of reads are likely to be compared to each other. To reduce redundant work and make comparisons easier, there is now a group parameters, that can be used to put samples using the same reads into a group. Note: putting samples with different inputs into the same group will give wrong results.
I also noticed over the course of refactoring that there is some information that simply needs to be passed to processes (mostly for config purposes), since fetching those values from params kind of defeats the idea of parameterising per-sample. Those get stuffed into meta as needed. Generally, I am storing all information in the channel-map during "transit" between processes. Overall, this results in a lot of not super-concise channel manipulation.
I have now run some more extensive tests, and overall the logic seems to work as intended.

nvnieuwk

Hi I've done my best but I found it pretty hard to read the code in this pipeline. so I can't approve this at this point... I've left a few comments. Here are some more tips to help with the readability:

Don't use it in closures, try to set a variable name for each item in the channel entry instead (e.g. instead of .map { it -> ...} do .map { meta, file1, file2 -> ...}. This makes it easier for me to understand what is in the channel at that point and will make it easier for future you (and others) to work on the pipeline later.
Use more clear variable names
Try to put some more comments above big code blocks with a short explanation of what this piece of code is for. (Especially on harder to understand pieces of code).

But anyways, I'm still really impressed with what you've done here and this really will be a massive improvement to the pipeline!

.nf-test.log

NOTE.md

docs/params.md

docs/usage.md

modules/local/collect_reads/main.nf

schema.md

subworkflows/local/assemble/main.nf

…ization

* Template update for nf-core/tools version 3.2.1 * Template update for nf-core/tools version 3.3.1 * merge template 3.3.1 - fix linting * update pre-commit * merge template 3.3.1 - fix linting * pre-commit config? * pre-commit config? * reinstall links * try larger runner * smaller run, disable bloom filter for hifiasm test * updated test snapshot * updated test snapshot * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * Update .github/actions/nf-test/action.yml Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * Update docs/output.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * remove .nf-test.log --------- Co-authored-by: Niklas Schandry <niklas@bio.lmu.de> Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>

…ization

vagkaratzas

The review will be coming in separate comments, ofc :P

What is this assets/report folder? Is there not a more nf-core place to put/execute those?
Need a workflow dark, and SVGs for both light and dark metro maps

.github/actions/nf-test/action.yml

.github/workflows/release-announcements.yml

vagkaratzas · 2026-02-20T08:22:29Z

conf/modules.config

 // Read preparation
 includeConfig 'modules/ont-prep.config'
 includeConfig 'modules/hifi-prep.config'
-includeConfig 'modules/trimgalore.config'


I can see the reason of splitting configs, but I still think I would prefer to have all the module configurations in one file. Will let you decide, and/or the second reviewer :P

I can see why you would prefer this, but in practice, having a massive module configuration file turned out to be very hard for me manage and I spent a lot of time looking for specific configurations in there and kept introducing duplications etc. There are around 70 withName selectors across the config files.

nschan · 2026-02-20T08:44:02Z

The review will be coming in separate comments, ofc :P

ok ;)

What is this assets/report folder? Is there not a more nf-core place to put/execute those?

assets/report contains the scripts for generating the report. If there is a better place to put them I am happy to put them somewhere else, but idk where.

Need a workflow dark, and SVGs for both light and dark metro maps

Good point, I will add those.

vagkaratzas

Another round. Hopefully another last one soon!

docs/output.md

docs/usage.md

README.md

NOTE.md

nf-test.config

CHANGELOG.md

workflows/genomeassembler.nf

vagkaratzas · 2026-02-20T08:54:17Z

workflows/genomeassembler.nf

+
+    //fastplong_jsons.view { it -> "UNQIE JSONS: $it"}
+
+    REPORT( report_files,


Nothing for multiqc? :O

I am not using multiqc for reporting

workflows/genomeassembler.nf

vagkaratzas

Final one. There is no way that everything will be working as intended with all these changes (and strategies..!)
I would suggest creating nf-test for all local subworkflows and modules if they dont already have. But I wouldn't stop you from a release for that.
BUT! I think this is a great opportunity to have a track for the upcoming hackathon for beginners and not only, to write nf-tests for everything. Just a suggestion ;)

vagkaratzas · 2026-02-20T09:19:47Z

modules/local/dorado/aligner/main.nf

+    def args = task.ext.args ?: ''
+
+    """
+    dorado aligner \\


Is this hard to push to nf-core?

Probably not, I stole this from https://github.com/nf-core/methylong/tree/2.0.0/modules/local/dorado

vagkaratzas · 2026-02-20T09:21:14Z

modules/local/genomescope/main.nf

@@ -14,7 +14,8 @@ process GENOMESCOPE {
        tuple val(meta), path("*_plot.log.png")     , emit: plot_log


https://nf-co.re/modules/genomescope2/
is on nf-core/modules, update and use that one, or maybe just patch it?

vagkaratzas · 2026-02-20T09:21:53Z

modules/local/gfa2fa/main.nf

@@ -2,37 +2,27 @@ process GFA_2_FA {
    tag "${meta.id}"


gfatools_gfa2fa also on nf-core modules

vagkaratzas · 2026-02-20T09:22:28Z

modules/local/jellyfish/count/main.nf

@@ -10,7 +10,8 @@ process COUNT {



https://nf-co.re/modules/jellyfish_count/ also (I made that one >.<)

A, great, I wasn't aware that you made an nf-core module, happy to switch!

vagkaratzas · 2026-02-20T09:22:50Z

modules/local/jellyfish/dump/main.nf

@@ -10,24 +10,16 @@ process DUMP {

    output:


And dump https://nf-co.re/modules/jellyfish_dump/ xD

vagkaratzas · 2026-02-20T09:36:25Z

subworkflows/local/scaffold/main.nf

-
-        ch_versions = ch_versions.mix(RUN_RAGTAG.out.versions)
-    }
+    channel.empty().set { links_busco }


I think the preferred nf-core way is

Suggested change

channel.empty().set { links_busco }

ch_links_busco = channel.empty()

instead of .set. But I guess too late for that now, given the number of files in teh PR xD

If it is preferred, but not mandatory, I would like to keep it with .set. I know some people prefer channel = but its ugly :/

vagkaratzas · 2026-02-20T09:37:53Z

subworkflows/local/scaffold/longstitch/main.nf


    emit:
-    scaffolds
+    ch_main                 = ch_main_scaffolded


main is not very self-explanatory as a name

Fair point. I am not sure how to better name this, since the main channel does not necessarily contain all steps for all samples. I think I am using main to mean that this has been transformed correctly and that everything is ready to move back into the main workflow / next subworkflow.

subworkflows/local/scaffolding/longstitch/main.nf

subworkflows/local/scaffold/hic/main.nf

vagkaratzas · 2026-02-20T09:40:35Z

subworkflows/local/prepare/main.nf

+    only once, and then the original channel is restored.
+
+    Brief description how this works:
+        // Move group information into channel, if it exists


Too many comments and code there. I guess it should be deleted? The explanation should be enough I guess

This comment is mainly for reviewers and potential future contributors. If this is useless, I can remove it.

nschan · 2026-02-20T11:53:31Z

Final one. There is no way that everything will be working as intended with all these changes (and strategies..!) I would suggest creating nf-test for all local subworkflows and modules if they dont already have. But I wouldn't stop you from a release for that. BUT! I think this is a great opportunity to have a track for the upcoming hackathon for beginners and not only, to write nf-tests for everything. Just a suggestion ;)

I would hope that everything works as intended and I did some larger tests before asking for reviews. Still it is likely that someone will try something that I did not think of; full_test tests a bunch of strategies. Having a track for this at the hackathon would be nice, but for me this overlaps with teaching obligations and I will not be able to participate myself. I am not sure how productive putting up a bunch of stuff without being involved would be?

vagkaratzas · 2026-02-20T12:19:27Z

Final one. There is no way that everything will be working as intended with all these changes (and strategies..!) I would suggest creating nf-test for all local subworkflows and modules if they dont already have. But I wouldn't stop you from a release for that. BUT! I think this is a great opportunity to have a track for the upcoming hackathon for beginners and not only, to write nf-tests for everything. Just a suggestion ;)

I would hope that everything works as intended and I did some larger tests before asking for reviews. Still it is likely that someone will try something that I did not think of; full_test tests a bunch of strategies. Having a track for this at the hackathon would be nice, but for me this overlaps with teaching obligations and I will not be able to participate myself. I am not sure how productive putting up a bunch of stuff without being involved would be?

I would ask the thematically closest persons / facility / nf-core slack channels for potential track leaders that would want to oversee this if they don't already have a project but feel confident enough with nf-tests.

nschan and others added 7 commits May 22, 2025 15:49

add prefix to singularity container for report

e4eea9d

Merge branch 'nf-core:dev' into dev

e148a95

Merge branch 'nf-core:dev' into dev

51281c7

Merge branch 'nf-core:dev' into dev

5c92346

Merge branch 'nf-core:dev' into dev

5c836b1

Merge branch 'nf-core:dev' into dev

268a29b

Merge branch 'nf-core:dev' into dev

0202358

nschan and others added 5 commits July 11, 2025 11:07

Merge branch 'nf-core:dev' into dev

4de24cc

maxime comments

4e98303

comments Daniel Lundin

376973f

update release date

a8e9afc

Revert "comments Daniel Lundin"

5c5b2d2

This reverts commit 376973f.

nschan changed the title ~~Refactor for sample-wise parameterisation~~ v2.0.0: Refactor for sample-wise parameterisation Sep 4, 2025

nvnieuwk reviewed Sep 16, 2025

View reviewed changes

nschan mentioned this pull request Sep 17, 2025

Per-sample assembly strategy & params? #160

Closed

nschan added 2 commits October 24, 2025 10:42

Merge branch 'nf-core:dev' into dev

d343d61

Merge branch 'nf-core:dev' into dev

4ff031d

nschan mentioned this pull request Nov 20, 2025

QUAST QC step fails when running de novo assembly without reference genome #178

Open

nschan mentioned this pull request Jan 26, 2026

REPORT generation fails with tmpdir error in containerized environments #179

Open

nschan mentioned this pull request Feb 4, 2026

Use CSI indexes for BAMs instead of BAI indexes. #181

Open

5 tasks

nschan force-pushed the refactor-assemblers branch from 58e4a0b to 4e4e735 Compare February 13, 2026 08:25

nschan and others added 7 commits February 13, 2026 09:38

Merge branch 'nf-core:dev' into dev

ce7a5a0

prepare pipeline initialization for sample-wise parameterization

484c1d2

refactor assemble and assemble subworkflows for sample-wise parameter…

ae122c6

…ization

checks, typos

539f309

refactor assemble and assemble subworkflows for sample-wise parameter…

5579db4

…ization

intermediate WIP commit

39b5d38

nschan force-pushed the refactor-assemblers branch 2 times, most recently from 83c7f7a to 9d160c6 Compare February 18, 2026 15:11

fix bug in assembler argument resolution that may affect test

2c08e3a

nschan force-pushed the refactor-assemblers branch from 9d160c6 to 2c08e3a Compare February 18, 2026 15:22

nschan added 9 commits February 18, 2026 16:47

fix versions in gfa2fa, update test

5c71d57

sort out assembler arg approach update test

7765a61

exclude unstable files

b5949d3

new metromap

4cf59b8

remove params.md

edef5f6

replace combine with join in polish_pilon

43fb04f

remove medaka from full_test

df608fa

add test_full_medaka

8295789

update readme

47e5892

vagkaratzas self-requested a review February 20, 2026 08:07

vagkaratzas reviewed Feb 20, 2026

View reviewed changes

nschan added 2 commits February 20, 2026 09:34

update CITATIONS

d23f008

update bsky-post action version

83bdff7

vagkaratzas reviewed Feb 20, 2026

View reviewed changes

nschan added 7 commits February 20, 2026 10:02

remove NOTE

0923ec6

remove mystery characters

5abe752

fix headers in output.md

8257711

fix indentation in nf-test.config

86bca4b

remove debug leftovers

651a914

update report group input to use group from meta

a8fd65f

remove additional debug statement

aedc9c8

vagkaratzas requested changes Feb 20, 2026

View reviewed changes

rename scaffold workflow folder, clean up comments

4f69670


		//fastplong_jsons.view { it -> "UNQIE JSONS: $it"}

		REPORT( report_files,

		@@ -14,7 +14,8 @@ process GENOMESCOPE {
		tuple val(meta), path("*_plot.log.png") , emit: plot_log

	channel.empty().set { links_busco }
	ch_links_busco = channel.empty()

Conversation

nschan commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nf-core-bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nschan commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvnieuwk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vagkaratzas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nschan commented Feb 20, 2026

Uh oh!

vagkaratzas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vagkaratzas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nschan commented Feb 20, 2026

nschan commented Jul 9, 2025 •

edited

Loading

nf-core-bot commented Jul 9, 2025 •

edited

Loading

nschan commented Sep 4, 2025 •

edited

Loading