16S rRNA amplicon analysis workflow (QIIME2 → R)

This repository contains an R-based downstream analysis workflow for 16S rRNA amplicon sequencing data, developed during my PhD research.

The workflow is designed primarily for host-associated low-microbial biomass samples and makes extensive use of negative controls and mock communities to assess contamination, detection limits, and data plausibility.

It also makes use of environmental controls - i.e. samples from adjacent biological niches in order to compare taxonomic and phylogenetic profiles of the different sample-types.

This is a research workflow rather than a polished software package, and is shared for transparency, reuse, and reproducibility.

Overview of the workflow

The analysis assumes that upstream processing has already been performed using QIIME2. The R script performs the following major steps:

Import of QIIME2 outputs
- Feature table
- Taxonomy assignments (SILVA and/or Greengenes)
- Rooted phylogenetic tree
- Sample metadata
- Construction of a phyloseq object
Initial quality control
- Removal of non-bacterial features
- Filtering of implausible or low-prevalence taxa
- Basic inspection of sequencing depth and feature counts
Contamination assessment
- Identification and removal of contaminant ASVs using decontam
- Use of negative controls where available
Mock community analysis
- Evaluation of mock community composition
- Determination of minimum detection thresholds using:
  - Cell-based dilution series
  - DNA-based logarithmic mock standards
Final filtering
- Removal of features below empirically determined thresholds
- Generation of the final analysis-ready dataset
- Analysis of alpha rarefaction and sampling depth
Diversity analyses
- Alpha diversity
- Beta diversity (ordination-based analyses)
Taxonomic summaries
- Bar plots and other relative abundance summaries
- Taxonomic composition across sample groups
Comparative niche analysis
- Comparison of target samples to environmental or adjacent niches
Phylogenetic visualisation
- Phylograms and tree-based representations of selected taxa

Intended mode of use

This workflow is designed to be run interactively in R (e.g. in RStudio), executing sections sequentially and inspecting outputs as they are generated.

Several steps (particularly initial QC, contamination assessment, and mock-based thresholding) are intentionally not fully automated, as they require dataset-specific judgement and biological plausibility checks.

While the full script can be sourced end-to-end, users are strongly encouraged to step through the analysis and review intermediate results before proceeding to downstream filtering and diversity analyses.

Input requirements

The workflow expects the following inputs:

QIIME2 feature table (.qza)
Rooted phylogenetic tree (.qza)
Taxonomy assignments (SILVA and/or Greengenes)
Sample metadata file (TSV format)

Details on required columns in the metadata file are described in the script comments.

Repository structure

analysis/
  main_analysis.R      # Main analysis script

config/
  config_example.R     # Example configuration file (paths & parameters)

data/
  README.md            # Place input data here (not tracked)

output/
  README.md            # Analysis outputs are written here

How to run

Clone the repository
Copy config/config_example.R to config/config.R
Edit file paths and dataset-specific parameters
Run the analysis script:

source("analysis/main_analysis.R")

Notes and limitations

This workflow was developed for low-biomass 16S datasets
Some filtering steps and mock-based thresholds are dataset-specific
Sections of the script are clearly marked where manual intervention or adaptation may be required
Users are encouraged to read the script comments carefully before reuse.

Acknowledgements

This workflow draws on methods, ideas, and code patterns from multiple sources, including but not limited to:

Callahan et al. (DADA2)
Davis et al. (decontam)
F1000Research microbiome analysis guidelines
QIIME2 documentation and tutorials

Any adaptations, errors, or interpretations are my own.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

16S rRNA amplicon analysis workflow (QIIME2 → R)

Overview of the workflow

Intended mode of use

Input requirements

Repository structure

How to run

Notes and limitations

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
analysis		analysis
config		config
data		data
output		output
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

License

RichStack/Stack_SemenMicrobiome_2022

Folders and files

Latest commit

History

Repository files navigation

16S rRNA amplicon analysis workflow (QIIME2 → R)

Overview of the workflow

Intended mode of use

Input requirements

Repository structure

How to run

Notes and limitations

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages