-
Notifications
You must be signed in to change notification settings - Fork 2
Description
@sminot - as @crosenth and I discussed this, it seems that all of the pieces are in place to support the following workflow without the need for additional code.
Consider this workflow:
- sequencing core uploads fastqs to s3
- lab user identifies the sequencing run, and uploads sample information in the form of an excel file containing (at minimum) columns "sampleid" (filename prefix for fastq files) and "batch" (other fields are included in a pipeline output that is used in downstream analyses).
- cirro generates a file containing the full s3 path of all uploaded fastqs (in
fastq_list.txt) - cirro prompts the user for pipeline params (or substitutes defaults)
- cirro generates a json-format params file specifying the s3 path for the uploaded excel sheet, the fastq_list, and pipeline params
- cirro launches the pipeline with the params file as an input
- pipeline associates fastqs with metadata (https://github.com/nhoffman/dada2-nf/blob/master/bin/manifest.py)
There is also an option to provide the pipeline with a unified manifest + fastq list in this fomat: https://github.com/nhoffman/dada2-nf/blob/master/test/manifest.csv - but this begs the question of where (and when) this file would be generated... I think that we've come full circle to our initial conversations about Cirro wanting to combine the fastq files with associated metadata independently from the pipeline. I think that I'm understanding now that this assumes that the fastq files and manifest will be uploaded at the same time, which doesn't align with the expected lab workflow. So perhaps the original model for providing inputs (fastq_list.txt + manifest.xlsx) is the most convenient given the expected workflow. If so, I think we're ready to go.
If you would like to test the above, see fastqs and sample sheet below (in bvdiversity):
data/miseq-plate-90/run-files/*/Data/Intensities/BaseCalls/m90*.fastq.gzdata/miseq-plate-90/sample-information/sample-information-m90.xlsx