StruSel is a computational pipeline for identifying pathogen-exclusive drug target candidates through proteome-scale structural comparison. It goes beyond traditional sequence-based homology exclusion by leveraging AI-predicted 3D structures to uncover proteins that are structurally divergent from the human proteome — representing a largely unexplored space for selective antimicrobial drug discovery.
Given a target pathogen, StruSel:
- Downloads AlphaFold2-predicted structures for the pathogen proteome
- Applies pLDDT-based quality filtering (≥ 70) to retain high-confidence structures
- Performs structurome-wide structural similarity search against the human structurome using FoldSeek (3Di + GPU)
- Performs parallel sequence-based search using MMseqs2 for benchmarking
- Computes ESM2 protein language model embeddings for embedding-space similarity (prototype)
- Classifies all proteins into a seven-category scheme (A/B/B2/C/C2/D/D*) based on sequence–structure concordance
- Cross-references with essential gene databases (DEG) to filter for essential proteins
- Maps candidates to CARD (via RGI) to assess AMR relevance
- Outputs a tiered, annotated list of pathogen-exclusive target candidates
| Proteome | Total structures | pLDDT ≥ 70 | Source |
|---|---|---|---|
| K. pneumoniae (UP000007841) | 5,727 | 5,358 | AlphaFold DB v6 |
| H. sapiens (UP000005640) | 23,586 | 15,716 | AlphaFold DB v6 |
| H. sapiens (sequence) | 205,155 | — | UniProt |
| Category | Description | Count |
|---|---|---|
| A | Seq-homolog + Struct-homolog | 445 |
| B | Seq-similar + Struct-divergent | 13 |
| B2 | Seq-similar + Struct-partial | 5 |
| C | Structural mimic (no seq hit) | 1,167 |
| C2 | Partial structural mimic (no seq hit) | 287 |
| D | StruSel candidate (no seq hit, low TM) | 151 |
| D* | High-confidence StruSel candidate (no seq/struct hit) | 3,290 |
| Total | 5,358 |
| Tier | Definition | Count |
|---|---|---|
| Tier 1 | Essential + D/D* + no AMR (prime targets) | 1,412 |
| Tier 2 | Essential + D/D* + AMR-associated | 16 |
| Tier 3 | Essential + C/C2 + no AMR (structural mimics) | 895 |
| Tier 4 | Essential + C/C2 + AMR-associated | 7 |
- Enzymatic targets: 277 / 1,412 (19.6%)
- Membrane-associated: 466
- Transcription regulators: 219
- Transporters: 355
- Top molecular functions: DNA-binding transcription factor activity (171), ATP binding (155), metal ion binding (113)
| Parameter | Value |
|---|---|
| pLDDT cutoff | ≥ 70.0 |
| MMseqs2 e-value | 1e-5 |
| MMseqs2 sensitivity | 7.5 |
| Query/target coverage | ≥ 0.7 |
| FoldSeek e-value | 0.001 |
| TM-score homolog | ≥ 0.5 |
| TM-score unrelated | < 0.3 |
| ESM2 cosine similarity | ≤ 0.5 |
Current implementation: Klebsiella pneumoniae (WHO critical priority pathogen)
Planned extension: Pan-ESKAPE (E. faecium, S. aureus, K. pneumoniae, A. baumannii, P. aeruginosa, Enterobacter spp.)
Pathogen proteome (AlphaFold DB)
│
▼
pLDDT quality filter (≥ 70)
│
├──────────────────────────────────┐
▼ ▼
MMseqs2 sequence search FoldSeek structural search (GPU)
│ │
├──────────────────────────────────┤
▼ ▼
Sequence hits/non-homologs Structural hits (TM-score bins)
│ │
└──────────┬───────────────────────┘
▼
Seven-category classification (A/B/B2/C/C2/D/D*)
│
▼
Essential gene filtering (DEG)
│
▼
AMR mapping (CARD-RGI)
│
▼
Tiered target candidates (Tier 1–4)
│
▼
UniProt GO/pathway annotation
strusel/
├── README.md
├── LICENSE
├── notebooks/
│ └── strusel_pipeline.ipynb # Full Colab pipeline (12 cells)
├── scripts/
│ └── .gitkeep
└── results/
├── strusel_classification.tsv
├── strusel_tier1_prime_targets.tsv
├── strusel_tier2_amr_targets.tsv
├── strusel_tier3_mimic_risk.tsv
├── strusel_tier4_amr_mimic.tsv
├── strusel_top_amr_targets.tsv
├── tier1_uniprot_annotations.tsv
└── StruSel_Report_Klebsiella.pdf
The pipeline runs end-to-end on Google Colab (GPU runtime, ~45 min). All dependencies are installed within the notebook.
- Python 3.8+
- FoldSeek (GPU-enabled)
- MMseqs2
- DIAMOND (for RGI)
- ESM2 (
fair-esm) - RGI (CARD)
- Biopython, pandas, numpy, matplotlib, seaborn
If you use StruSel, please cite:
Pranavathiyani G. StruSel: Structurome-wide Selectivity of pathogen-exclusive targets. GitHub. https://github.com/pranavathiyani/strusel
Pranavathiyani Gnanasekar Assistant Professor (Research), Division of Bioinformatics SASTRA Deemed University, Thanjavur pranavathiyani@scbt.sastra.edu Google Scholar | ORCID
MIT License. See LICENSE for details.