PATHOS - Protein variant Analysis Through Human-Optimized Scoring

PATHOS predicts pathogenicity of protein variants using protein language models (ESM-C 600M, Ankh2 Large). Pre-computed scores for 139M+ variants across 17,574 human proteins.

Installation

Set up PATHOS with a single script that installs dependencies and downloads the database.

Prerequisites: ~35 GB disk space

git clone https://github.com/DSIMB/PATHOS.git
cd PATHOS
./setup_pathos.sh
conda activate PATHOS_env

Usage

Query pathogenicity scores for protein variants using UniProt IDs and mutation notation.

Single mutation query

python run_pathos.py --protein P16501 --mutation M1A

Batch query from file

python run_pathos.py --file variants.txt --output results.csv

Filter results

python run_pathos.py --protein P16501 --min-score 0.9 --output pathogenic.csv

Input file format

Supports TXT, TSV, and CSV formats. Headers are auto-detected and skipped.

TXT/TSV (space or tab-separated):

P16501 M1A R56V    # Multiple mutations per line
Q9Y6X3 M1C         # Single mutation
P10635             # Full scan (all 19 substitutions per position)

CSV (comma-separated):

Protein,Mutation
P16501,M1A
P16501,R56V
Q9Y6X3,M1C

How it works

If all queried variants are already in the precomputed database (139M+ variants), results are returned instantly.

For variants not in the database, PATHOS performs de novo prediction:

Load UniProt sequences and validate mutations
Check/generate MSAs using mmseqs2 (if not already generated)
Compute PASTML conservation scores
Extract UniProt annotations and allele frequencies
Generate embeddings with ESMC 600M and Ankh2 Large
Run PATHOS inference (ensemble of both models)

Output

Results are displayed in the terminal and exported to CSV with the following columns:

UniProt ID
Mutation (e.g., M1A)
PATHOS score (0-1)
Classification (Benign/Pathogenic)

Score interpretation

PATHOS outputs a score between 0 and 1 indicating the probability of pathogenicity.

Score	Classification
< 0.63	Benign
>= 0.63	Pathogenic

Command-line options

Full list of available options for run_pathos.py.

Option	Description
`-i, --input`	Input file (TXT, TSV, or CSV)
`-o, --output`	Output CSV file
`--n-jobs`	Number of parallel workers for feature generation (default: 5). Increase for faster processing on multi-core systems.
`--batch-size`	Batch size for embedding generation (default: 100)
`--mmseqs-mem-limit`	Memory limit for mmseqs2 MSA generation (default: 8G)
`--batch-threshold`	Number of variants above which batched mode is enabled (default: 10000)

Embeddings download

Soon available

Citation

If you use PATHOS in your research, please cite:

Radjasandirane, R., Cretin, G., Diharce, J., de Brevern, A. G., & Gelly, J. C. (2025). PATHOS: Predicting Variant Pathogenicity by Combining Protein Language Models and Biological Features. medRxiv, 2025-12.

Contact

radja.ragou@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
database		database
env		env
models		models
README.md		README.md
example_input.txt		example_input.txt
query_pathos.py		query_pathos.py
run_pathos.py		run_pathos.py
setup_pathos.sh		setup_pathos.sh
var_test.tsv		var_test.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PATHOS - Protein variant Analysis Through Human-Optimized Scoring

Installation

Usage

Single mutation query

Batch query from file

Filter results

Input file format

How it works

Output

Score interpretation

Command-line options

Embeddings download

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

DSIMB/PATHOS

Folders and files

Latest commit

History

Repository files navigation

PATHOS - Protein variant Analysis Through Human-Optimized Scoring

Installation

Usage

Single mutation query

Batch query from file

Filter results

Input file format

How it works

Output

Score interpretation

Command-line options

Embeddings download

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages