StruSel

Structurome-wide Selectivity of pathogen-exclusive targets

Overview

StruSel is a computational pipeline for identifying pathogen-exclusive drug target candidates through proteome-scale structural comparison. It goes beyond traditional sequence-based homology exclusion by leveraging AI-predicted 3D structures to uncover proteins that are structurally divergent from the human proteome — representing a largely unexplored space for selective antimicrobial drug discovery.

Given a target pathogen, StruSel:

Downloads AlphaFold2-predicted structures for the pathogen proteome
Applies pLDDT-based quality filtering (≥ 70) to retain high-confidence structures
Performs structurome-wide structural similarity search against the human structurome using FoldSeek (3Di + GPU)
Performs parallel sequence-based search using MMseqs2 for benchmarking
Computes ESM2 protein language model embeddings for embedding-space similarity (prototype)
Classifies all proteins into a seven-category scheme (A/B/B2/C/C2/D/D*) based on sequence–structure concordance
Cross-references with essential gene databases (DEG) to filter for essential proteins
Maps candidates to CARD (via RGI) to assess AMR relevance
Outputs a tiered, annotated list of pathogen-exclusive target candidates

Results: Klebsiella pneumoniae vs Human Structurome

Input Scale

Proteome	Total structures	pLDDT ≥ 70	Source
K. pneumoniae (UP000007841)	5,727	5,358	AlphaFold DB v6
H. sapiens (UP000005640)	23,586	15,716	AlphaFold DB v6
H. sapiens (sequence)	205,155	—	UniProt

Seven-Category Classification

Category	Description	Count
A	Seq-homolog + Struct-homolog	445
B	Seq-similar + Struct-divergent	13
B2	Seq-similar + Struct-partial	5
C	Structural mimic (no seq hit)	1,167
C2	Partial structural mimic (no seq hit)	287
D	StruSel candidate (no seq hit, low TM)	151
D*	High-confidence StruSel candidate (no seq/struct hit)	3,290
	Total	5,358

Target Tiers

Tier	Definition	Count
Tier 1	Essential + D/D* + no AMR (prime targets)	1,412
Tier 2	Essential + D/D* + AMR-associated	16
Tier 3	Essential + C/C2 + no AMR (structural mimics)	895
Tier 4	Essential + C/C2 + AMR-associated	7

Tier 1 Functional Profile

Enzymatic targets: 277 / 1,412 (19.6%)
Membrane-associated: 466
Transcription regulators: 219
Transporters: 355
Top molecular functions: DNA-binding transcription factor activity (171), ATP binding (155), metal ion binding (113)

Pipeline Parameters

Parameter	Value
pLDDT cutoff	≥ 70.0
MMseqs2 e-value	1e-5
MMseqs2 sensitivity	7.5
Query/target coverage	≥ 0.7
FoldSeek e-value	0.001
TM-score homolog	≥ 0.5
TM-score unrelated	< 0.3
ESM2 cosine similarity	≤ 0.5

Use Case

Current implementation: Klebsiella pneumoniae (WHO critical priority pathogen)

Planned extension: Pan-ESKAPE (E. faecium, S. aureus, K. pneumoniae, A. baumannii, P. aeruginosa, Enterobacter spp.)

Pipeline Overview

Pathogen proteome (AlphaFold DB)
        │
        ▼
pLDDT quality filter (≥ 70)
        │
        ├──────────────────────────────────┐
        ▼                                  ▼
MMseqs2 sequence search          FoldSeek structural search (GPU)
        │                                  │
        ├──────────────────────────────────┤
        ▼                                  ▼
  Sequence hits/non-homologs     Structural hits (TM-score bins)
        │                                  │
        └──────────┬───────────────────────┘
                   ▼
    Seven-category classification (A/B/B2/C/C2/D/D*)
                   │
                   ▼
    Essential gene filtering (DEG)
                   │
                   ▼
    AMR mapping (CARD-RGI)
                   │
                   ▼
    Tiered target candidates (Tier 1–4)
                   │
                   ▼
    UniProt GO/pathway annotation

Repository Structure

strusel/
├── README.md
├── LICENSE
├── notebooks/
│   └── strusel_pipeline.ipynb    # Full Colab pipeline (12 cells)
├── scripts/
│   └── .gitkeep
└── results/
    ├── strusel_classification.tsv
    ├── strusel_tier1_prime_targets.tsv
    ├── strusel_tier2_amr_targets.tsv
    ├── strusel_tier3_mimic_risk.tsv
    ├── strusel_tier4_amr_mimic.tsv
    ├── strusel_top_amr_targets.tsv
    ├── tier1_uniprot_annotations.tsv
    └── StruSel_Report_Klebsiella.pdf

Quick Start (Google Colab)

The pipeline runs end-to-end on Google Colab (GPU runtime, ~45 min). All dependencies are installed within the notebook.

Dependencies

Python 3.8+
FoldSeek (GPU-enabled)
MMseqs2
DIAMOND (for RGI)
ESM2 (fair-esm)
RGI (CARD)
Biopython, pandas, numpy, matplotlib, seaborn

Citation

If you use StruSel, please cite:

Pranavathiyani G. StruSel: Structurome-wide Selectivity of pathogen-exclusive targets. GitHub. https://github.com/pranavathiyani/strusel

Author

Pranavathiyani Gnanasekar Assistant Professor (Research), Division of Bioinformatics SASTRA Deemed University, Thanjavur pranavathiyani@scbt.sastra.edu Google Scholar | ORCID

License

MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StruSel

Structurome-wide Selectivity of pathogen-exclusive targets

Overview

Results: Klebsiella pneumoniae vs Human Structurome

Input Scale

Seven-Category Classification

Target Tiers

Tier 1 Functional Profile

Pipeline Parameters

Use Case

Pipeline Overview

Repository Structure

Quick Start (Google Colab)

Dependencies

Citation

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebooks		notebooks
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

StruSel

Structurome-wide Selectivity of pathogen-exclusive targets

Overview

Results: Klebsiella pneumoniae vs Human Structurome

Input Scale

Seven-Category Classification

Target Tiers

Tier 1 Functional Profile

Pipeline Parameters

Use Case

Pipeline Overview

Repository Structure

Quick Start (Google Colab)

Dependencies

Citation

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages