To create the conda environment, run the following commands:
conda create --name rfm python=3.11.8 -y
conda activate rfm
# If using CUDA:
pip install torch==2.3.0 --index-url https://download.pytorch.org/whl/cu118
pip install dgl==2.2.1+cu118 -f https://data.dgl.ai/wheels/torch-2.3/cu118/repo.html
# If using CPU:
pip install torch==2.3.0 --index-url https://download.pytorch.org/whl/cpu
pip install dgl==1.1.2 -f https://data.dgl.ai/wheels/torch-2.3/cpu/repo.html
pip install -e .
pip install pre-commit
pre-commit installTo prepare the training datasets, run the following notebooks under notebooks/created_dataset:
create_positive.ipynb. It removes the atom mapping from the raw USPTO dataset. We call this dataset "positive".extract_forward_templates.ipynb. It extracts the forward templates from the USPTO dataset.create_negative_forward.ipynb. It creates the negative reactions by applying the forward templates to reactants from the positive dataset.create_negative_shuffle.ipynb. It creates the negative reactions by shuffling the reactants from the positive dataset. A product from a positive dataset is assigned with a reactants coming from a similar (in terms of Tanimoto distance) reaction.merge_files.ipynb. It merges the positive and negative datasets into a single one.
The default configs logs to experiments directory and caches the molecules encodings in processed_graphs. If you want to store the data on other partition, you can create a symlink to the desired location.
To train the model, run the following command:
python -m scripts.train --cfg configs/rfm_train.ginIf you want to use other dataset, you should create a configs/datasets/<your_dataset>.gin file pointing to a *.csv file with reactants product and feasible columns. Then you need to replace "include 'configs/datasets/forward_with_shuffle.gin'" with "include 'configs/datasets/<your_dataset>.gin'" in the configs/rfm_train.gin.