Skip to content

realcann/Thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 Thesis – Reproducible Genomic Analysis Pipeline (Nextflow)

This repository contains a reproducible bioinformatics pipeline developed for undergraduate thesis work. The pipeline performs genome processing and comparative analysis, including quality control, trimming, assembly, annotation, and pangenome analysis.

πŸš€ Features

Reproducible workflow using Nextflow

Modular pipeline design (DSL2)

Supports:

Quality Control (FastQC)

Trimming

Genome Assembly (SPAdes)

Genome Annotation (Prokka)

Comparative Genomics (MUMmer)

Pangenome Analysis (Roary)

Scalable and portable across systems

πŸ“¦ Requirements

Make sure the following are installed:

Nextflow (>= 22.x) Conda (recommended)

βš™οΈ Setup

Clone the repository:

git clone https://github.com/realcann/Thesis.git

cd Thesis

πŸ“‚ Input Data

The pipeline expects:

Paired-end FASTQ files

A reference genome in FASTA format

Example structure:

files/

β”œβ”€β”€ sample_R1_001.fastq

β”œβ”€β”€ sample_R2_001.fastq

└── Earth_reference.fasta

⚠️ Note: Input data is not included due to size limitations.

▢️ Running the Pipeline Basic run:

nextflow run main.nf -profile conda
--fastq_files "files/*_R{1,2}_001.fastq"
--ref_fasta "files/Earth_reference.fasta"

Custom output directory:

nextflow run main.nf -profile conda
--fastq_files "your_data/*_R{1,2}.fastq.gz"
--ref_fasta "your_data/reference.fasta"
--outdir "results"

πŸ“Š Output

Results will be generated in the specified output directory (out/ by default), including:

Quality control reports

Trimmed reads

Assembled genomes

Annotation outputs

Comparative analysis results

Pangenome outputs

πŸ§ͺ Reproducibility

All dependencies are managed via environment.yml The pipeline can be executed on any system with Nextflow + Conda Intermediate files are stored in the work/ directory

πŸ“ Project Structure

.

β”œβ”€β”€ main.nf

β”œβ”€β”€ nextflow.config

β”œβ”€β”€ environment.yml

β”œβ”€β”€ workflow/

β”œβ”€β”€ modules/

β”œβ”€β”€ files/ (not included)

β”œβ”€β”€ out/ (generated)

πŸ‘€ Author

Can GerΓ§ek

Istanbul University

Molecular Biology and Genetics

Bioinformatics & Comparative Genomics

πŸ“Œ Notes

This pipeline was developed as part of a thesis project focusing on reproducible genomic analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors