This repository provides a reproduction of π RXNGraphormer, a unified pre-trained framework for reaction performance prediction and synthesis planning. The original work is by Xu et al.
This reproduction focuses on validating core functionalities, reaction-type analysis, and extending evaluation to external literature datasets.
RXNGraphormer/reproduction/
βββ 1_basic_usage.ipynb # Core model functionality verification
βββ 2_Reaction_Type_Visual.ipynb # Reaction type discrimination & clustering (part new work)
βββ 2_ReactionType_Res/ # Saved visualization results for reaction type analysis
βββ 3_regression.sh # Regression task training script
βββ 3_png/ # Output figures from regression experiments
βββ 4_USPTO.sh # USPTO-style dataset training for sequence generation
βββ 4_uspto/ # Logs and outputs for USPTO experiments
βββ 5_SPR.ipynb # Structure-performance relationship analysis
βββ 6_experiment_results.ipynb # External validation on literature datasets (new work)
βββ 7_finetune_guide.ipynb # Fine-tuning user guide and example
βββ finetune_grid_search.py # Grid search script for fine-tuning
βββ 7_finetune_results/ # Logs, configs, and results from fine-tuning experiments
βββ README.md # Documentation
For better reproducibility, the internal directory structures of config, dataset, and model_path have been reorganized compared to the original repository.
This reproduction uses the original pre-trained model weights; we only perform fine-tuning on downstream tasks (e.g., yield, selectivity prediction).
For sequence generation tasks, models are fine-tuned on USPTO-50k and USPTO-480k, while the USPTO-full model is evaluated without retraining.
All training logs and checkpoints are saved under corresponding subdirectories in model_path/.
# Install the additional dependency for reaction-type clustering
pip install hdbscanβ
hdbscanis used in2_Reaction_Type_Visual.ipynbfor unsupervised clustering of reaction embeddings.
-
For all datasets (
USPTO_STEREO,USPTO_full,USPTO_480k,USPTO_50k,OOS,external_validation_dataset, and50k_with_rxn_type,bechmark):
Download from the original model's Figshare repository.
These preprocessed datasets are part of the original RXNGraphormer release. -
For
Test.zip:
Download from our Figshare repository.
This test set contains newly curated real-world reaction data from literature and high-throughput experimentation (HTE) for external validation.
Note:
- All model checkpoints, training logs, and evaluation results are available in our Figshare repository and correspond to our independent reproduction runs.
- Please follow the dataset directory structure outlined below after extraction.
- π‘ This ensures full reproducibility of all experiments presented in the
reproduction/notebooks and scripts.
- β Basic inference and embedding generation
- β Reaction type classification and unsupervised clustering
- β Regression tasks (yield, regioselectivity, enantioselectivity)
- β Sequence generation (forward/retro-synthesis) on USPTO dataset
- β Structure-performance relationship (SPR) analysis
- β External validation on real-world HTE or literature datasets
The following table summarizes the performance comparison between the original RXNGraphormer and this Reproduction across benchmark, out-of-sample (OOS), and external datasets.
| Data | RXNGraphormer | Reproduction | ||||||||
| R2 | MAE | Precision | ACC | R2 | MAE | Precision | ACC | |||
| Benchmark datasets | BuchwaldβHartwig | 0.971 | 2.980 | / | / | 0.970Β±0.003 | 3.079Β±0.144 | / | / | |
| SuzukiβMiyaura | 0.876 | 6.300 | / | / | 0.871Β±0.009 | 6.431Β±0.187 | / | / | ||
| CβH functionalization | 0.992 | 0.266 | / | / | 0.992Β±0.001 | 0.273Β±0.007 | / | / | ||
| Asymmetric thiol | 0.915 | 0.134 | / | / | 0.916Β±0.010 | 0.135Β±0.007 | / | / | ||
| OOS | Buchwald Hartwig | Additive 1 | 0.883 | 6.430 | / | / | 0.815 | 8.310 | / | / |
| Additive 2 | 0.906 | 6.000 | / | / | 0.897 | 6.280 | / | / | ||
| Additive 3 | 0.792 | 8.500 | / | / | 0.651 | 10.399 | / | / | ||
| Additive 4 | 0.736 | 9.940 | / | / | 0.643 | 10.966 | / | / | ||
| Bromide | 0.890 | 5.810 | / | / | 0.869 | 5.934 | / | / | ||
| Chloride | -0.053 | 15.120 | / | / | -0.377 | 18.879 | / | / | ||
| Iodide | 0.823 | 7.540 | / | / | 0.844 | 7.186 | / | / | ||
| Component-combination | 0.725 | 10.120 | / | / | 0.732 | 9.457 | / | / | ||
| Thiol addition | Cat | 0.781 | 0.236 | / | / | 0.804 | 0.230 | / | / | |
| Sub | 0.923 | 0.138 | / | / | 0.915 | 0.138 | / | / | ||
| Sub and Cat | 0.804 | 0.248 | / | / | 0.732 | 0.257 | / | / | ||
| External | Nicolit Avg | 0.308 | 21.760 | 0.793 | 0.732 | 0.209 | 37.199 | 0.796 | 0.730 | |
| Asymmetric hydrogenation of olefins | 0.832 | 0.371 | / | / | 0.739 | 0.477 | / | / | ||
| Pallada-electrocatalyzed CβH activation | 0.924 | 0.211 | / | / | 0.900 | 0.196 | / | / | ||
This section evaluates synthesis planning performance via Top-n accuracy metrics on both retrosynthetic and forward synthesis tasks.
| Task | Dataset | RXNGraphormer | Reproduction | note | ||||||
| top-n accuracy(%) | top-n accuracy(%) | |||||||||
| 1 | 3 | 5 | 10 | 1 | 3 | 5 | 10 | |||
| Retrosynthetic | USPTO-50k | 51.0 | 69.0 | 74.2 | 79.2 | 50.3 | 69.3 | 73.7 | 78.0 | fine-tuned |
| USPTO-full | 47.4 | 63.0 | 67.4 | 71.6 | 47.3 | 62.9 | 67.5 | 71.6 | inference-only | |
| Forward | USPTO-480k | 90.6 | 94.3 | 94.9 | 95.5 | 90.5 | 94.4 | 95.1 | 95.7 | fine-tuned |
| USPTO-STEREO | 78.2 | 85.1 | 86.5 | 87.8 | 78.1 | 84.9 | 86.4 | 87.7 | fine-tuned | |
Model generalization is validated on newly introduced real-world datasets (e.g., HTE or literature-derived reactions), with results compared to baseline methods.
| DATA | Origin Model | Rxngraphormer | ||||
| R2 | MAE(%) | R2 | MAE(%) | |||
| Sulfoxonium | Train Set | 0.89 | 6.60 | 0.91 | 5.70 | |
| Validation Set | 0.77 | 8.00 | 0.60 | 9.46 | ||
| Meta_C_H | Train Set | 0.75 | 9.30 | 0.82 | 5.76 | |
| Independt Test Set | 0.74 | 9.10 | 0.78 | 6.49 | ||
| Strict Independt Test Set | 0.71 | 11.60 | -1.19 | 23.22 | ||
| Amide Coupling HTE | Full HTE (with NATURE intermediate) | Random split | 0.66 | 10.00 | 0.58 | 14.14 |
| Partial Novelty | 0.68 | 14.00 | 0.59 | 14.31 | ||
| Full Novelty | 0.63 | 15.00 | 0.58 | 12.56 | ||
| Full HTE (without intermediate) | Random split | 0.66 | 10.00 | 0.58 | 14.31 | |
| Partial Novelty | 0.68 | 14.00 | 0.66 | 13.12 | ||
| Full Novelty | 0.63 | 15.00 | 0.44 | 14.93 | ||
| DCC (with intermediate) | Random split | 0.86 | 8.00 | 0.37 | 16.63 | |
| Partial Novelty | 0.81 | 11.00 | 0.27 | 15.99 | ||
| Full Novelty | 0.67 | 7.00 | -0.41 | 13.05 | ||
| EDC (with intermediate) | Random split | 0.89 | 6.10 | 0.23 | 18.46 | |
| Partial Novelty | 0.88 | 9.00 | 0.20 | 18.32 | ||
| Full Novelty | 0.75 | 14.00 | -0.10 | 22.23 | ||
| HATU (with intermediate) | Random split | 0.86 | 6.00 | 0.08 | 19.08 | |
| Partial Novelty | 0.78 | 12.00 | -0.09 | 20.57 | ||
| Full Novelty | 0.84 | 7.00 | -0.37 | 16.34 | ||
| PyBOP (with intermediate) | Random split | 0.90 | 5.00 | 0.35 | 15.54 | |
| Partial Novelty | 0.82 | 10.00 | 0.38 | 14.09 | ||
| Full Novelty | 0.89 | 8.00 | 0.22 | 15.56 | ||
| TBTU (without intermediate) | Random split | 0.71 | 10.00 | 0.49 | 13.40 | |
| Partial Novelty | 0.57 | 16.00 | 0.31 | 11.70 | ||
| Full Novelty | 0.66 | 13.00 | 0.64 | 5.96 | ||
| HBTU (without intermediate) | Random split | 0.83 | 8.00 | 0.23 | 18.02 | |
| Partial Novelty | 0.72 | 13.00 | 0.04 | 18.75 | ||
| Full Novelty | 0.68 | 14.00 | -0.11 | 17.88 | ||
| Amide Coupling Literature | 0.39 | 13.30 | 0.35 | 12.48 | ||
To further evaluate generalization in synthesis planning, we compare our reproduced results with those reported in the original paper on non-USPTO datasets.
| Setting | Model | Invalid SMILES (%) | Top-1 Acc (with SC) (%) | Top-1 Acc (w/o SC) (%) |
|---|---|---|---|---|
| Separated | Origin Model | 0.40 | 66.10 | 66.92 |
| Separated (USPTO_480k) | RXNGraphormer | 0.75 | 83.34 | 83.40 |
| Separated (USPTO_STEREO) | RXNGraphormer | 0.52 | 83.31 | 85.64 |
| Mixed | Origin Model | 0.27 | 84.12 | 85.20 |
| Mixed (USPTO_480k) | RXNGraphormer | 0.77 | 83.29 | 83.35 |
| Mixed (USPTO_STEREO) | RXNGraphormer | 0.48 | 83.30 | 85.64 |
| Model | Invalid SMILES (%) | Top-1 Acc (with SC) (%) | Top-1 Acc (w/o SC) (%) |
|---|---|---|---|
| Origin Model | 0.27 | 37.22 | 37.42 |
| RXNGraphormer (USPTO_full) | 4.01 | 27.59 | 28.03 |
| RXNGraphormer (USPTO_50k) | 3.05 | 16.52 | 16.64 |
Thanks to the original authors for open-sourcing RXNGraphormer. This reproduction builds directly upon their codebase and methodology.
π‘ Note: For full installation instructions and model details, please refer to the original README.