Skip to content

Incorporate logic from chim_ints.R script into dada2-nf pipeline  #17

@dhoogest

Description

@dhoogest

There is a small R script in the current NGS16S pipeline which extracts from the dada2.rda file the sequences which were dropped as chimeras and outputs a csv with columns weight and sequence. This file is not used for any downstream classification pipeline dependencies, however is useful to have for troubleshooting when a larger fraction of reads is lost during the chimera removal phase. The script currently outputs a file for each sample in the run, but it is not necessary to separate per sample if that is less convenient for the nextflow pipeline

#!/usr/bin/env Rscript

suppressPackageStartupMessages(library(argparse, quietly = TRUE))
suppressPackageStartupMessages(library(tidyverse, quietly = TRUE))

do_intermediates <- function(dada.path, id, out.path) {
    load(dada.path)
    pre <- as.data.frame(as.table(seqtab))
    post <- as.data.frame(as.table(seqtab.nochim))
    pre %>% anti_join(post, by=c('Var2')) %>%
        filter(Var1==id) %>% # ex: '624-27'
        arrange(-Freq) %>% rename(sequence=Var2) %>%
        rename(weight=Freq) %>% select(weight, sequence) %>%
        write_csv(out.path)
}

main <- function(arguments) {
    parser <- ArgumentParser()
    parser$add_argument('--rdata', default='dada2.rda')
    parser$add_argument('--id')
    parser$add_argument('--out')

    args <- parser$parse_args(arguments)
    
    do_intermediates(args$rdata, args$id, args$out)
}

main(commandArgs(trailingOnly=TRUE))


Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions