Skip to content

A 16S-rRNA interactive analysis workflow written in R. Contains modular sections devoted to QC, filtering, diversity analysis, and taxonomic/phylogenetic visualisations.

License

Notifications You must be signed in to change notification settings

RichStack/Stack_SemenMicrobiome_2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

16S rRNA amplicon analysis workflow (QIIME2 → R)

This repository contains an R-based downstream analysis workflow for 16S rRNA amplicon sequencing data, developed during my PhD research.

The workflow is designed primarily for host-associated low-microbial biomass samples and makes extensive use of negative controls and mock communities to assess contamination, detection limits, and data plausibility.

It also makes use of environmental controls - i.e. samples from adjacent biological niches in order to compare taxonomic and phylogenetic profiles of the different sample-types.

This is a research workflow rather than a polished software package, and is shared for transparency, reuse, and reproducibility.


Overview of the workflow

The analysis assumes that upstream processing has already been performed using QIIME2. The R script performs the following major steps:

  1. Import of QIIME2 outputs

    • Feature table
    • Taxonomy assignments (SILVA and/or Greengenes)
    • Rooted phylogenetic tree
    • Sample metadata
    • Construction of a phyloseq object
  2. Initial quality control

    • Removal of non-bacterial features
    • Filtering of implausible or low-prevalence taxa
    • Basic inspection of sequencing depth and feature counts
  3. Contamination assessment

    • Identification and removal of contaminant ASVs using decontam
    • Use of negative controls where available
  4. Mock community analysis

    • Evaluation of mock community composition
    • Determination of minimum detection thresholds using:
      • Cell-based dilution series
      • DNA-based logarithmic mock standards
  5. Final filtering

    • Removal of features below empirically determined thresholds
    • Generation of the final analysis-ready dataset
    • Analysis of alpha rarefaction and sampling depth
  6. Diversity analyses

    • Alpha diversity
    • Beta diversity (ordination-based analyses)
  7. Taxonomic summaries

    • Bar plots and other relative abundance summaries
    • Taxonomic composition across sample groups
  8. Comparative niche analysis

    • Comparison of target samples to environmental or adjacent niches
  9. Phylogenetic visualisation

    • Phylograms and tree-based representations of selected taxa

Intended mode of use

This workflow is designed to be run interactively in R (e.g. in RStudio), executing sections sequentially and inspecting outputs as they are generated.

Several steps (particularly initial QC, contamination assessment, and mock-based thresholding) are intentionally not fully automated, as they require dataset-specific judgement and biological plausibility checks.

While the full script can be sourced end-to-end, users are strongly encouraged to step through the analysis and review intermediate results before proceeding to downstream filtering and diversity analyses.


Input requirements

The workflow expects the following inputs:

  • QIIME2 feature table (.qza)
  • Rooted phylogenetic tree (.qza)
  • Taxonomy assignments (SILVA and/or Greengenes)
  • Sample metadata file (TSV format)

Details on required columns in the metadata file are described in the script comments.


Repository structure

analysis/
  main_analysis.R      # Main analysis script

config/
  config_example.R     # Example configuration file (paths & parameters)

data/
  README.md            # Place input data here (not tracked)

output/
  README.md            # Analysis outputs are written here

How to run

  • Clone the repository
  • Copy config/config_example.R to config/config.R
  • Edit file paths and dataset-specific parameters
  • Run the analysis script:

source("analysis/main_analysis.R")


Notes and limitations

  • This workflow was developed for low-biomass 16S datasets
  • Some filtering steps and mock-based thresholds are dataset-specific
  • Sections of the script are clearly marked where manual intervention or adaptation may be required
  • Users are encouraged to read the script comments carefully before reuse.

Acknowledgements

This workflow draws on methods, ideas, and code patterns from multiple sources, including but not limited to:

  • Callahan et al. (DADA2)
  • Davis et al. (decontam)
  • F1000Research microbiome analysis guidelines
  • QIIME2 documentation and tutorials

Any adaptations, errors, or interpretations are my own.

About

A 16S-rRNA interactive analysis workflow written in R. Contains modular sections devoted to QC, filtering, diversity analysis, and taxonomic/phylogenetic visualisations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages