Phylomatic

Automated phylogenetic inference from Sanger sequencing data.

Drop in raw .ab1 chromatograms. Get back a publication-ready phylogenetic tree.

Why Phylomatic?

If you've done 16S identification from Sanger reads, you know what the process looks like: open your .ab1 files in FinchTV, trim the low-quality ends by hand, copy the sequence into BLAST, wait, grab the accession numbers, fetch references from Entrez one by one, paste everything into Clustal Omega, wait again, download the alignment, load it into MEGA, build a tree, export it, annotate it.

For one sample that's just tedious. For twenty samples it eats an entire afternoon. For a class where every student needs to do it, it's a guaranteed stream of "my BLAST timed out" and "MEGA won't open the file" emails.

Phylomatic runs the whole thing in a single click.

Phylomatic results view showing phylogenetic tree, top match, and export options

Consensus sequence viewer and BLAST hits table

The pipeline

.ab1 reads ──> Consensus ──> BLASTn ──> References ──> MSA ──> NJ Tree ──> SVG
  (2 files)      FASTA        NCBI       Entrez       Clustal   BioPython   Annotated

Assembly -- reads forward and reverse .ab1 chromatograms, quality-trims both ends at PHRED < 20, reverse-complements the reverse read, and builds a consensus by picking the higher-quality base at each position.
BLAST -- submits the consensus to NCBI BLASTn. Supports multiple databases: 16S ribosomal RNA (filtered for proper species-level hits), the full nucleotide collection, RefSeq RNA, or ITS for fungal work. Returns the top 15 hits with identity, coverage, and E-values.
Reference fetch -- pulls FASTA sequences for the top hits via NCBI Entrez E-utilities. Uncultured and environmental sequences are filtered out automatically so the tree shows real species names.
Alignment -- submits the consensus plus references to the EBI Clustal Omega REST API for multiple sequence alignment.
Tree construction -- builds a Neighbor-Joining tree from the alignment distance matrix using BioPython. Labels use genus + species names extracted from BLAST hit descriptions.
Visualization -- renders the tree as an annotated SVG with the query sequence highlighted. Zoomable and pannable in the browser. Exportable as SVG, PNG (2x), or Newick.

Quick start

git clone https://github.com/iliasmahboub/Phylomatic.git
cd Phylomatic

pip install -r backend/requirements.txt
cd frontend && npm install && cd ..

npm run dev

Open http://localhost:5173, drop your .ab1 files, and click Run pipeline. The app asks for your email at runtime (NCBI requires one for API access, no signup needed). The whole process takes 2-5 minutes depending on NCBI and EBI response times.

Architecture

┌─────────────────────────────────────────────────────────┐
│  Frontend (React 18 + TypeScript + Vite + Tailwind)     │
│  :5173                                                  │
│  ┌──────────┐ ┌──────────────┐ ┌───────────────┐       │
│  │ DropZone │ │ PipelineTrack│ │ PhyloTree     │       │
│  │          │ │              │ │ (zoom/pan SVG)│       │
│  └──────────┘ └──────────────┘ └───────────────┘       │
│  ┌──────────────┐ ┌───────────┐ ┌──────────────┐       │
│  │ BlastResults  │ │ SeqViewer │ │ ExportPanel  │       │
│  └──────────────┘ └───────────┘ └──────────────┘       │
└────────────────────────┬────────────────────────────────┘
                         │ REST + WebSocket
┌────────────────────────┴────────────────────────────────┐
│  Backend (FastAPI + BioPython + asyncio)                 │
│  :8000                                                  │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐   │
│  │ assembly │→│  blast   │→│  entrez  │→│alignment │   │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘   │
│  ┌──────────┐ ┌──────────┐                              │
│  │   tree   │→│visualize │                              │
│  └──────────┘ └──────────┘                              │
└─────────────────────────────────────────────────────────┘
                         │
          ┌──────────────┼──────────────┐
          ▼              ▼              ▼
   NCBI BLASTn    NCBI Entrez    EBI Clustal Omega
   (URL API)      (E-utilities)  (REST API)

Each pipeline stage is an independent module in backend/app/pipeline/. They can be imported and tested without the web layer. The frontend connects over WebSocket for real-time progress.

Running modules standalone

Each step works on its own from the command line:

cd backend

python -m app.pipeline.assembly fwd.ab1 rev.ab1
python -m app.pipeline.blast consensus.fasta
python -m app.pipeline.entrez ACC1 ACC2 ACC3
python -m app.pipeline.alignment refs.fasta
python -m app.pipeline.tree aligned.fasta
python -m app.pipeline.visualize tree.nwk

Stack

Layer	Technology
Backend	Python 3.11, FastAPI, BioPython, httpx, asyncio
Frontend	React 18, TypeScript, Vite, Tailwind CSS
External APIs	NCBI BLAST URL API, NCBI Entrez E-utilities, EBI Clustal Omega REST
Testing	pytest, pytest-asyncio, pytest-httpx

API

Method	Path	Description
`POST`	`/api/run`	Upload `.ab1` files, start pipeline
`GET`	`/api/status/{job_id}`	Current stage and progress
`GET`	`/api/results/{job_id}`	Full results (hits, SVG, Newick)
`WS`	`/ws/{job_id}`	Real-time stage updates
`DELETE`	`/api/job/{job_id}`	Clean up job data

Testing

cd backend
pytest tests/ -v

Unit tests cover assembly, BLAST XML parsing, and tree construction. All external API calls are mocked.

Docker

docker compose up

How it works (for non-bioinformaticians)

When scientists find an unknown bacterium, they need to figure out what species it is. One common approach is to sequence a specific gene (the 16S ribosomal RNA gene) that all bacteria have but that varies enough between species to tell them apart. Sanger sequencing reads the gene from both directions, producing two .ab1 chromatogram files that record the raw signal from the sequencer.

Phylomatic takes those two files and:

Cleans up the noisy ends of each read and merges them into one clean sequence
Searches the NCBI database for the most similar known sequences
Downloads those reference sequences
Lines them all up (alignment) to see exactly where they differ
Builds a family tree showing how closely related the unknown bacterium is to each known species
Draws that tree in the browser so it can be examined, zoomed, and exported

The result is a phylogenetic tree: a branching diagram where closely related species sit near each other and the branch lengths reflect how much their DNA differs.

Citation

If you use Phylomatic in your research, please cite:

Mahboub, I. (2026). Phylomatic: Automated phylogenetic inference from Sanger
sequencing data. https://github.com/iliasmahboub/Phylomatic

License

MIT

Ilias Mahboub

Molecular Biosciences, Duke University / Duke Kunshan University

Research Trainee @ Dzirasa Lab (Duke SM) · Yuan Lab (SJTU-SM) · Remy Lab

im132@duke.edu

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json
paper.bib		paper.bib
paper.md		paper.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phylomatic

Why Phylomatic?

The pipeline

Quick start

Architecture

Running modules standalone

Stack

API

Testing

Docker

How it works (for non-bioinformaticians)

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Phylomatic

Why Phylomatic?

The pipeline

Quick start

Architecture

Running modules standalone

Stack

API

Testing

Docker

How it works (for non-bioinformaticians)

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages