Field workshop on the taxonomy and natural history of freshwater and limno-terrestrial meiofauna 🪱 🏞️

This is the code used to process COI/28S/18S amplicon data from the 2025 NERC/NHM Lake District Freshwater Meiofauna workshop

Installation and setup

You'll need this software, pelase make separate conda envs for each:
- pychopper v2.7.0 : conda create -n pychopper bioconda::pychopper
- cutadapt v4.9.0: conda create -n cutadapt bioconda::cutadapt
- Amplicon Sorter (no releases, separate conda packages to install) conda create -n amplicon_sorter && conda install bioconda::python-edlib biopython matplotlib
- seqkit v2.9.0: conda create -n seqkit bioconda::seqkit
- barrnap v0.9: conda create -n barrnap bioconda::barrnap
- seqtk v1.4-r122: conda create -n seqtk bioconda::seqtk
- hmmer v3.1b2: conda create -n hmmer bioconda::hmmer
- bedtools v2.31.1: conda create -n bedtools bioconda::bedtools

How to structure

Run the scripts in the parent directory to the cloned github repo (so the filepaths remain relative)
This workflow is written for a SLURM HPC system (I used the CropDiversity cluster)
To run this code, git clone this whole repo, keeping the adapter directory and the scripts directory in the relative places they're in so that the codes run smoothly (or, edit the code yourself only in your cloned remote repo!

The workflow

01_pychopper.sh: sbatch $0 /path/to/dataset/raw_fastq_file
- To reorient and quality-score trim cDNA reads into the same orientation
- This script takes in 2 extra files, which are hardcoded into the script:
  - one which has the orientation of adapters sequences listed (M13_config_for_pychopper.txt)
  - one which has the non-variable sequences for the adapters (M13_seqs_for_pychopper.fa)
02_cutadapt_loop.sh: sbatch $0 /path/to/dataset/pychopped/pychopped_sample1.fastq.gz
- A loop which first demultiplexes the reads based on the 5' SP5 primers, and then, one-by-one, demultiplexes each output from the SP5 demultiplexing by the SP27 3' primers. It also trims adapters on the fly.
- This script takes in 2 extra files, which are hardcoded into the script: the 5' and 3' adapter sequences, M13_amplicon_indices_forward.fa and the reverse complement of the 3' sequences (since we reoriented before with pychopper, M13_amplicon_indices_reverse_rc.fa)
- You may notice that we do bin reads into adapter combinations which do not exist! Though this is inefficient, it's not the end of the world, since we have sequenced deeply enough per individual that enough reads are binned into the correct adapter combinations
- These non-existing adapter combos are removed after demuxing, along with the 'unknown' bins, since we don't want to analyse them later on.
03_amplicon_sorter.sh: sbatch $0 -input /path/to/dataset/demuxed/2nd_round [-min <int> -max <int> -prefix <amplicon>]
- The workflow is more dynamic here, since the user may be analysing different amplicons for us. In our wet-lab protocol:
  - For the rRNAs, we use 2 sets of primers to amplify overlapping segments of the nuclear rRNA cistron, so a sequence can be anywhere over 3Kb in length
  - For the COIs, we use 2 alternate options for a forward primer, and one option for a reverse primer, to amplify 'redundant' sequences of a COI barcode segment (the 2 alternate options for the forward primer allow for matching more taxa than just one), so a sequence can be between 300bp-900bp in length
- The amplicon sorter step splits samples' reads by size and clusters them by sequence
- The variables (min length, max length, prefix name for amplicon type, and input folder) are user-defined :)
04a_cleaning_primers.sh: sbatch $0 <amplicon_sorted_dir> <amplicon_type> --r1-primers <file> --r2-primers <file> [--run-round2 --cluster-round1]
- This script takes each clustered (amplicon-sorted) file and removes the primer sequences from the amplicon sequences, since these are synthetic
- The user can define the primer sequences to remove based on the amplicon they are sequencing, but if more than one amplicon was sequenced in a run, the other primer sets can also be submitted as a back check
- Sequences with lone primers, mismatched primers, or >1 primer of a kind are removed in a failsafe
- Optionally, the user can choose to trim 'untrimmed' sequences with alternate primers
05a_barrnap_rRNA_extract.sh: sbatch $0 /path/to/dataset/primerless and 05b_reorganise_COIs.sh: sbatch $0 /path/to/dataset/primerless
- 5a script uses Barrnap version 0.9. It takes the assembled contigs and uses an HMM to extract sequences matching 28S and 18S rRNA profiles.
- 5b is a straightforward script copying over the cleaned, clustered/non-redundant primerless COIs from the primerless directory to a COI directory for clarity
06_summary.sh: $0 <dataset_path> <amplicon> [gene, optional]
- This wrapper script calls for an R script which creates a tsv of the sample, whether hits of the amplicon were found in the processing of the data, what the highest coverage was for an amplicon, and which hit it is for that amplicon per sample
- If an amplicon is found (yay!) then the header for the best hit is printed. If no amplicon was retrieved for that sample (boo), NA is there.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
adapters_primers		adapters_primers
schematics		schematics
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Field workshop on the taxonomy and natural history of freshwater and limno-terrestrial meiofauna 🪱 🏞️

This is the code used to process COI/28S/18S amplicon data from the 2025 NERC/NHM Lake District Freshwater Meiofauna workshop

Installation and setup

How to structure

The workflow

Primer schematics

rRNA amplification

COI amplification

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

srisarya/nanopore-barcoding-ORC

Folders and files

Latest commit

History

Repository files navigation

Field workshop on the taxonomy and natural history of freshwater and limno-terrestrial meiofauna 🪱 🏞️

This is the code used to process COI/28S/18S amplicon data from the 2025 NERC/NHM Lake District Freshwater Meiofauna workshop

Installation and setup

How to structure

The workflow

Primer schematics

rRNA amplification

COI amplification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages