SynDRA is a unified drug synonym mapping system designed to harmonize identifiers across major biomedical resources.
It bridges gaps between external drug sources and transcriptomic perturbation datasets such as LINCS/CMap, improving drug repurposing workflows by resolving inconsistent naming.
Try SynDRA interactively online: https://tolgacorbaci.shinyapps.io/syndra/
- Integrates synonyms from KatDB, TTD, PRISM, and LINCS2020
- Normalizes and deduplicates synonyms into a single mapping
- Links identifiers across BRD_IDs, TTD_IDs, and PubChem CIDs
- Increases match rates for drug repurposing pipelines (e.g., +8.5% for FO5A benchmark set)
Result:
193,113 unique synonyms mapped to 33,858 BRD_IDs, 2,775 TTD_IDs, and 950 PubChem CIDs.
-
KatDB Synonyms
- Source: Kat Koler
- File:
L1000_BRD_name_translated_drug_list.csv - Purpose: Expand recognition of BRD compounds in L1000 assays
- Example:
BRD-K52256627→ “chlorhexidine,” “N-[4-methylpiperazinyl]-…”
-
Therapeutic Targets Database (TTD)
- URL: TTD Download
- File:
P1-04-Drug_synonyms.txt - Purpose: Drug–target linked synonyms
- Example:
D00AAN→ “d00aan,” chemical descriptors
-
PRISM Drug Synonyms
- URL: PRISM GitHub
- File:
PRISM_drug_synonyms.csv - Purpose: Supports MOA enrichment
- Example:
PubChem_CID 11314340→ “a-674563”
-
LINCS 2020 Compound Metadata
- URL: Clue.io Data Dashboard
- File:
compoundinfo_beta.txt - Purpose: Standard compound identifiers for L1000 perturbation studies
-
Initial entry counts
- katdb_df: 13,176
- ttd_df: 299,047
- prism_df: 112,784
-
Initial unique identifiers
- BROAD_drug_IDs: 5,539
- TTD_drug_IDs: 30,713
- PubChem_CIDs: 1,351
-
LINCS2020 coverage
- Unique BROAD_drug_IDs: 33,613
- Synonyms: 34,234
-
LINCS2020 + KatDB combined
- Unique BROAD_drug_IDs: 33,858
- Synonyms: 45,617
- Convert all synonyms to lowercase
- Split multi-synonym strings into separate rows
- Strip whitespace and formatting artifacts
- Synonym explosion → one synonym per row
- Outer join on synonyms → maximize matches
- ID propagation → forward/backward fill with shared synonyms
- Grouping & aggregation → unique sets of IDs
- Filter → drop rows without
BROAD_drug_ID
-
Rows: 193,221
-
Unique synonyms: 193,113
-
Identifiers:
- BROAD_drug_IDs: 33,858
- TTD_drug_IDs: 2,775
- PubChem_CIDs: 950
-
Missing values (NaNs):
- BROAD_drug_ID: 0
- synonyms: 0
- TTD_drug_ID: 45,737
- PubChem_CID: 88,237
-
Duplicate rows: 0
| BROAD_drug_ID | Synonym | TTD_drug_ID | PubChem_CID |
|---|---|---|---|
| BRD-K52256627 | chlorhexidine | D0V4GY | 9552079 |
| BRD-K52256627 | chlorhexidine, combinations | D0V4GY | 9552079 |
| BRD-K52256627 | 1,1'-hexamethylenebis[...] | D0V4GY | 9552079 |
| Category | Initial Entries | Final Entries |
|---|---|---|
| BROAD_drug_ID | 45,617 | 33,858 |
| TTD_drug_ID | 299,047 | 2,775 |
| PubChem_CID | 112,784 | 950 |
| Total merged rows | — | 435,530 |
| Final unique synonym groups | — | 193,221 |
| Matched identifier instances | — | 1,145 |
Clone and run the pipeline:
git clone https://github.com/hidelab/SynDRA.git
cd SynDRA
jupyter notebook SynDRA_pipeline.ipynbThis project is licensed under the MIT License.
If you use SynDRA in your work, please cite:
T. Corbaci et. al., SynDRA: Synonym Mapping for Alignment of Repurposing Therapeutics (Brazilian Symposium on Bioinformatics BSB, 2025).
