-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
enhancementNew feature or requestNew feature or request
Description
There is a small R script in the current NGS16S pipeline which extracts from the dada2.rda file the sequences which were dropped as chimeras and outputs a csv with columns weight and sequence. This file is not used for any downstream classification pipeline dependencies, however is useful to have for troubleshooting when a larger fraction of reads is lost during the chimera removal phase. The script currently outputs a file for each sample in the run, but it is not necessary to separate per sample if that is less convenient for the nextflow pipeline
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(argparse, quietly = TRUE))
suppressPackageStartupMessages(library(tidyverse, quietly = TRUE))
do_intermediates <- function(dada.path, id, out.path) {
load(dada.path)
pre <- as.data.frame(as.table(seqtab))
post <- as.data.frame(as.table(seqtab.nochim))
pre %>% anti_join(post, by=c('Var2')) %>%
filter(Var1==id) %>% # ex: '624-27'
arrange(-Freq) %>% rename(sequence=Var2) %>%
rename(weight=Freq) %>% select(weight, sequence) %>%
write_csv(out.path)
}
main <- function(arguments) {
parser <- ArgumentParser()
parser$add_argument('--rdata', default='dada2.rda')
parser$add_argument('--id')
parser$add_argument('--out')
args <- parser$parse_args(arguments)
do_intermediates(args$rdata, args$id, args$out)
}
main(commandArgs(trailingOnly=TRUE))
marykstewart
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request