Skip to content

SigmanGroup/HT_TSs_Opt

Repository files navigation

High-Throughput Optimization and Featurization of TSs and Catalytic Cycle Intermediates

This repository contains scripts for high-throughput generation, optimization, and featurization of TSs and catalytic cycle intermediates, along with scripts for MLR and active learning modeling and the Excel spreadsheets with input data. This workflow refers to this paper published in Nature (DOI 10.1038/s41586-026-10239-7).

Dependencies

This workflow relies on published computational tools, including AaronTools,1 Molassembler,2 and mARC.3 See the links for detailed installation guides.

Step 1: Templates Generation

Ligand and Int/TSRE Template Structures

The "NN_Ligands" folder contains a .cdxml file with 2D representations of all the ligands investigated in this paper. Their corresponding SMILES strings are listed in NN_Ligands.csv, while the .xyz geometries used as input for AaronTools are in XYZ_Structures.zip.

Template geometries for IntC_R.xyz/IntC_S.xyz and TSRE_R.xyz/TSRE_S.xyz were taken from Doyle et al.4 and Reisman et al.5 and are located in the "Step_1_Templates_Generation" folder. See Step 3 for TSRC structures generation.

AaronTools

The following must be specified in a .bashrc file (or similar):

export PYTHONPATH=/home/$USER/AARON_TOOLS/:$PYTHONPATH
export PATH=$PATH:/home/$USER/AARON_TOOLS/AaronTools/bin

A Bash script like this may then be used to execute the mapLigand.py AaronTools script and iterate through a .txt file containing the names of the ligands to map onto the templates (here, IntC_R.xyz and IntC_S.xyz):

#!/bin/bash
# Check if a filename argument is provided
if [ -z "$1" ]; then
  echo "Usage: $0 filename.txt"
  exit 1
  fi

# Loop through each line in the provided text file
while IFS= read -r file_id
do
  mapLigand.py IntC_R.xyz -l 45,46="$file_id" -o "${file_id}_IntC_R.xyz"
  mapLigand.py IntC_S.xyz -l 45,46="$file_id" -o "${file_id}_IntC_S.xyz"
done < "$1"

.xyz files of the ligands are located in /AARON_TOOLS/AaronTools/Ligands, while IntC_R.xyz and IntC_R.xyz are pre-optimized templates (at a DFT level of theory or at the GFN2-xTB level) of the two diastereometic L*Ni(III)(C-sp2)(C-sp3)X intermediates. The script can be adapted to generate templates for TSRE_R.xyz/TSRE_S.xyz, or to modify the substrates using the substitute.py script of AaronTools. Note that, since PyOx and PyIm ligands are asymmetric, to generate structures with those ligand classes two .xyz files for each ligand are included in /AARON_TOOLS/AaronTools/Ligands, with the indices of the chelating N atoms (N1 and N2) swapped (e.g., L0079_CfA.xyz and L0079_CfB.xyz with N1 = 1, N2 = 2 and vice versa).

Structures generated by AaronTools should be manually checked (with Jmol, Molden, or any other visualization software) to ensure that no clashes between atoms exist, and then optimized at the GFN2-xTB level with key bonds and angles being constrained (details are provided elsewhere). To ensure that the atom indices of the common chemical motif are conserved throughout all the structures generated by AaronTools, the following Bash script is used to re-order the atom indices prior to optimization (list.txt is a text file containing the atom indices of the structures generated by AaronTools ordered according to the desired output e.g., if the atom labelled 30 should become 1, then it must be listed first in list.txt):

#!/bin/bash

# Validate input arguments
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 path/to/list.txt"
    exit 1
fi

list_file="$1"

# Process each .xyz file in the directory
for file in *.xyz; do
    echo "Processing $file..."

    # Create a temporary file for the reordered content
    temp_file="${file%.xyz}_reordered.xyz"

    # Read the first two lines and write them to the temporary file
    head -n 2 "$file" > "$temp_file"

    # Read the reordering indices from list.txt and adjust by adding 2
    mapfile -t order < <(awk '{print $1 + 2}' "$list_file")

    # Create an associative array to mark lines that need reordering
    declare -A is_reordered
    for index in "${order[@]}"; do
        is_reordered[$index]=1
    done

    # Append reordered lines as per the list
    for line_number in "${order[@]}"; do
        sed -n "${line_number}p" "$file"
    done >> "$temp_file"

    # Calculate total lines in the file
    total_lines=$(wc -l < "$file")

    # Append remaining lines that are not reordered
    for ((i=3; i<=total_lines; i++)); do
        if [[ -z ${is_reordered[$i]} ]]; then
            sed -n "${i}p" "$file"
        fi
    done >> "$temp_file"

    # Replace the original file with the reordered file
    mv "$temp_file" "$file"
done

echo "All files have been processed."

Step 2: Conformational Sampling

The following Bash script is used to iterate through a text file containing the names of the ligands in the library. It calls a Python script (conformer_generator.py) to run Molassembler (provided in the folder "Step_2_Conformational_Sampling"). This script was adapted from code written by Dr Rubén Laplaza (LCMD, EPFL) and has been published.6

conformer_generator.py may be adapted to change the maximum number of conformers generated (max_n_confs, 250 has been used as default) and the indices of the Ni and Br atoms (return list(set(shells + list(range(20,21))))). The radius_adjacency function may also be adapted depending on the bond lengths in the Ni atom first coordination sphere (in the initial .xyz templates).

#!/bin/bash
# Check if a filename argument is provided
if [ -z "$1" ]; then
    echo "Usage: $0 filename.txt"
    exit 1
fi

# Loop through each line in the provided text file
while IFS= read -r file_id
do
    python conformer_generator.py "${file_id}_TSRE_R.xyz"
    python conformer_generator.py "${file_id}_TSRE_S.xyz"
done < "$1"

Step 3: Constrained Optimization

All the .xyz files generated by Molassembler are optimized at the GFN2-xTB level with key bonds and angles being constrained (specifically, the Ni–C(sp2), Ni–C(sp3), Ni–Br, C(sp2)–C(sp3), and C(sp3)–H bonds). Since thousands of structures are generated in Step 2, to optimize computational resources geometry optimizations are submitted as a GNU parallel job. An example Bash script is shown below (52 CPUs, 1 CPU per task).

#!/bin/bash
#SBATCH --partition=sigman-shared-np
#SBATCH --account=sigman-np
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=52
hostname
env | grep SLURM
cat TSRE.txt | awk '{print $1}' | parallel -j $SLURM_NTASKS ./xtb_batch

TSRE.txt is a text file listing all the .xyz structures to optimize. This script calls the GFN2-xTB submission script, shown below:

#!/bin/bash
module load xtb/6.6.1
file=$1

ulimit -s unlimited
export OMP_STACKSIZE=30GB
export OMP_NUM_THREADS=16,1
export OMP_MAX_ACTIVE_LEVELS=1
export MKL_NUM_THREADS=16

WORKDIR=$PWD
mkdir $WORKDIR/${file%".xyz"}_output
cp $WORKDIR/${file} $WORKDIR/${file%".xyz"}_output/
cp $WORKDIR/constraints.inp $WORKDIR/${file%".xyz"}_output/
cd $WORKDIR/${file%".xyz"}_output/

xtb --input constraints.inp ${file} --opt --charge 0 --uhf 1

cp xtbopt.xyz $WORKDIR/${file%".xyz"}_opt.xyz

rm -rf $WORKDIR/${file%".xyz"}_output

Once optimization is complete, file_opt.xyz structures are renamed to file.xyz.

Note on reactions with vinyl bromide substrates

Molassembler may occasionally fail at conserving the (E)-stereochemistry of the substrate when a graph is projected back into 3D coordinates, incorrectly generating structures with (Z)-stereochemistry. To address this issue, for reactions C, E, and G max_n_confs inside conformer_generator.py was increased to 500 (rather than 250). Following constrained optimization of all the structures, the Python script classify_dihedral.py was used to filter out geometries where the alkene was in the incorrect (Z)-configuration, resulting in the desired ca. 250 structures (per diastereomeric species) with the (E)-configuration.

Step 4: Full Optimization

The following Bash script is used to generate .com files for all the optimized file.xyz structures. This script takes as input the name of the directory where the .xyz files are located, and requires the template file gauss_input.com to be located in the working directory.

mkdir $1_gauss_comps
  
for mol in `ls $1`
do
    echo $1/$mol
    tail -n +3 $1/$mol > file
    sed -e "/0 2 here/r file" gauss_input.com  > ./$1_gauss_comps/${mol%".xyz"}.com
    sed -i "s/file_name/${mol%".xyz"}/g" ./$1_gauss_comps/${mol%".xyz"}.com
    sed -i "s/here//g" ./$1_gauss_comps/${mol%".xyz"}.com
    rm file
done

An example of gauss_input.com for the optimization of the reductive elimination TSs with Gaussian167 at the spGNF2-xTB level8 is shown below. This input calls the external keyword implemented in Gaussian, which requests a calculation using an external program. A Perl wrapper script (xtb-gaussian) to make the xtb binary usable with the external command was developed by the group of Alán Aspuru-Guzik and is hosted on GitHub (it is also available in the folder "Step_4_Full_Optimization").

%chk=file_name.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
  opt=(calcfc,AddRedun,ts,noeigentest,nolinear,maxstep=5,MaxCycles=200,nomicro,recalcfc=25) freq=noraman

file_name

0 2 here

1 7 B
1 21 B
7 21 B

Similarly to Step 3, computations are submitted as a GNU parallel job:

#!/bin/bash
#SBATCH --partition=sigman-shared-np
#SBATCH --account=sigman-np
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=52
hostname
env | grep SLURM
cat TSRE.txt | awk '{print $1}' | parallel -j $SLURM_NTASKS ./xtb_g16_batch

This script calls a Gaussian submission script (where xtb is used as external program), shown below:

#!/bin/bash
export PATH=$PATH:/uufs/chpc.utah.edu/common/home/u6055669/software/xtb-gaussian:/uufs/chpc.utah.edu/sys/installdir/r8/xtb/6.6.1/bin/

function is_bin_in_path {
  builtin type -P "$1" &> /dev/null
}

is_bin_in_path xtb  && echo "Found xtb." || echo "No xtb found. Exit!" 
is_bin_in_path xtb-gaussian  && echo "Found xtb-gaussian." || echo "No xtb-gaussian found. Exit!" 
ulimit -s unlimited
export OMP_STACKSIZE=700M
export OMP_NUM_THREADS=8,1
export MKL_NUM_THREADS=8
module purge
module load gaussian16
module load intel intel-mkl
module load xtb/6.6.1

file=$1
WORKDIR=$PWD
mkdir $WORKDIR/${file%".com"}_output/
export TMPDIR=$WORKDIR/${file%".com"}_output/
export GAUSS_SCRDIR=${TMPDIR}
cp $WORKDIR/${file} $TMPDIR/
cd $TMPDIR
echo ${file}

g16 < ${file} > ${file%".com"}.log

cp $TMPDIR/${file%".com"}.log $WORKDIR
rm -rf $TMPDIR

Note that the script requires an existing xtb installation; the xtb binary, as well as the xtb-gaussian script, have to be available in the PATH.

Generation and Optimization of TSRC Conformers

Molassembler may fail at projecting back into 3D coordinates structures that are not interpreted as a single graph, such as the radical capture transition state (due to the Ni–C(sp3) bond being formed). To address this limitation, a relaxed PES scan is performed for every conformer of the L*Ni(III)(C-sp2)(C-sp3)X intermediate (Int_R and Int_S) generated by Molassembler (and optimized according to the procedure described in Step 3), elongating the Ni–C(sp3) bond. The GFN2-xTB submission script is shown below:

#!/bin/bash
module load xtb/6.6.1
file=$1

ulimit -s unlimited
export OMP_STACKSIZE=30GB
export OMP_NUM_THREADS=16,1
export OMP_MAX_ACTIVE_LEVELS=1
export MKL_NUM_THREADS=16

WORKDIR=$PWD
mkdir $WORKDIR/${file%".xyz"}_output
cp $WORKDIR/${file} $WORKDIR/${file%".xyz"}_output/
cp $WORKDIR/scan.inp $WORKDIR/${file%".xyz"}_output/
cd $WORKDIR/${file%".xyz"}_output/

xtb --input scan.inp ${file} --opt --charge 0 --uhf 1

cp xtbopt.xyz $WORKDIR/${file%".xyz"}_opt.xyz
cp xtbscan.log $WORKDIR/${file%".xyz"}_scan.log

rm -rf $WORKDIR/${file%".xyz"}_output

The input and output (i.e., the end point of the scan) geometries are then used to locate the radical capture transition state with the Synchronous Transit-Guided Quasi-Newton (STQN) Method (QST2). An example of the corresponding gauss_input.com file is shown below:

%chk=QST2.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
  opt=(qst2,calcfc,NoEigenTest,Loose,nolinear,maxstep=5,MaxCycles=150,nomicro,AddRedun,recalcfc=5) freq=noraman

Reactant

0 2 here1

16 49 B
1 49 48 A

Product

0 2 here2

16 49 B
1 49 48 A

Step 5: Filtering and Conformers Selection

Following the full geometry optimization, failed .log files (and their corresponding .com files) are filtered out using the following Bash script:

for file in *.log
do
 grep -v '^$' < "${file}" | tail -1 | grep "Normal termination" > /dev/null; result=${?}
 if [ "${result}" -ne 0 ]
 then
   declare job="${file%.log}"
   echo "${job}.log"
   mv "${job}.log" FAILED
   echo "${job}.com"
   mv "${job}.com" FAILED
fi
done

Coordinates are then extracted from the converged .log files using the printXYZ.py script of AaronTools. The Python script filter_IRC.py (available in the "Step_5_Filtering" folder) is then used to separate structures based on a specified bond (e.g., the C(sp2)–C(sp3) bond, or the Ni–C(sp3) bond) length threshold. This is important to seprate e.g., the L*Ni(III)(C-sp2)(C-sp3)X intermediates from structures that erroneously optimized to the post-reductive elimination complex.

To filter out structures that have not optimized to a first-order saddle point, the g16-ifreq.py Python script is executed. The script filters out the .log files with multiple imaginary frequencies or an imaginary frequency outside a user-defined range. Based on the manual inspection of multiple TS conformers for Case Study 1, the range specified for TSRC was -110 cm-1 < freq < -25 cm-1, while for TSRE it was -230 cm-1 < freq < -35 cm-1.

The following Bash script is then used to run mARC3 (version 0.1.10) and select unique conformers within a 10 kcal/mol energetic window (based on the spGFN2-xTB electronic energies). Note that, to have all the .xyz files in the right format for mARC, the Python scripts get_xTB_E.py and add_energies_xlsx.py (see the "Step_5_Filtering" folder) are run prior to mARC.

#!/bin/bash
# Check if a filename argument is provided
if [ -z "$1" ]; then
    echo "Usage: $0 filename.txt"
    exit 1
fi

# Loop through each line in the provided text file
while IFS= read -r file_id
do
    python -m navicat_marc -i "${file_id}_TSRE_R"*.xyz -m rmsd -ewin 10 -mine
    python -m navicat_marc -i "${file_id}_TSRE_S"*.xyz -m rmsd -ewin 10 -mine
done < "$1"

Finally, the thermal correction to the Gibbs Free energy is extracted from the .log files selected by mARC with the get_Gibbs_corr.py Python script.

Step 6: Single-Point Energy Computations

.inp files for Orca 5.09 are generated with the following Bash script, which calls the o4wb3c Fortran executable:

!/bin/bash
file=$1
module load gcc/8.5.0
module load openmpi/4.1.1
module load orca/5.0.3

WORKDIR=$PWD
mkdir $WORKDIR/${file%".xyz"}_input
cp $WORKDIR/${file} $WORKDIR/${file%".xyz"}_input/
cd $WORKDIR/${file%".xyz"}_input/

o4wb3c --struc ${file} --charge 0 --uhf 1

cp wb97x3c.inp $WORKDIR/${file%".xyz"}.inp
cd $WORKDIR
rm -rf $WORKDIR/${file%".xyz"}_input

sed -i 's/wB97X-D4/wB97X-D4 miniprint nopop/g' ${file%".xyz"}.inp
sed -i 's/nprocs   4/nprocs   16/g' ${file%".xyz"}.inp
sed -i '/%pal/i %cpcm\n  epsilon 2.25\n  refrac 1.4224\nend\n' ${file%".xyz"}.inp

The value of the dielectric constant and of the refractive index should be adjusted depending on the reaction's solvent.

The PCM/ωB97X-3c10 energies are extracted from the .out files with the get_SPC.py Python script (available in the "Step_6_SPC" folder). Gibbs free energies (G(T)_spc(Hartree)) are then calculated as the sum of the PCM/ωB97X-3c electronic energies and the spGFN2-XTB–level thermal corrections.

Step 7: Properties Computation and Extraction

The following gauss_input.com template is used to generate input files for the computation of molecular descriptors, which, for computational efficiency, are performed with the same xtb-Gaussian approach as Step 4. The solvent should be adapted based on the reaction.

%chk=file_name.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
  freq=noraman

file_name

0 2 here

--Link1--
%chk=file_name.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
  geom=check pm6 scf=yqc scrf=(pcm,solvent=n,n-DiMethylAcetamide)

file_name

0 2

--Link1--
%chk=file_name.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
  geom=check guess=read volume polar pop=hirshfeld prop=efg scrf=(pcm,solvent=n,n-DiMethylAcetamide)

file_name

0 2

The get_properties_HT_Worflow_spGFN2-xTB.ipynb Jupyter notebook (located in the "Step_7_Get_Properties" folder) is used to collect most of the descriptors. It was adapted from work published by Coley, Paton, Sigman et al.11 hosted on GitHub. Please see the link for detailed instructions.

The GetParameters.py Python script is also used to extract molecular features (specifically, dispersion descriptors,12 spin densities, and atomic solvent accessible surface areas) using ᴍᴏʀғᴇᴜs that are not implemented in the Get Properties notebook. All the relevant Bash and Python scripts are localed in the the "Step_7_Get_Properties" folder. The workflow is exectuted as follows:

  1. Run prepare_Morfeus.sh to separate all the .log (and their corresponding .xyz files) into distinct directories based on the template (i.e., the name of the ligand);
  2. Create an Atom_indices.csv file, containing the indices of atoms called N1, N2, C1, C2, C4, C5, R1, R2, X1, X2, Ni, Br, C1s, C2s, H1s (see Figure S2 of the ESI) for each template. Atom_indices.csv can also be used together with atom_map.py to create the input .xlsx file for get_properties_HT_Worflow_spGFN2-xTB.ipynb;
  3. Execute run_morfeus_script.sh: it calls Change_labels.py to update the atom indices (based on what is listed in Atom_indices.csv) inside GetParameter.py.
  4. GetParameter_{template}.py scripts are generated and executed inside the directory corresponding to each template. combine_morfeus_results.py can then be used to combine the separate output .csv files into one .xlsx file (e.g., TSRE_Morfeus_Results.xlsx);
  5. Parametes collected with GetParameter.py can then be combined with those collected via get_properties_HT_Worflow_spGFN2-xTB.ipynb and post-processed inside the notebook (remember to add G(T)_spc(Hartree) calculated in Step 6);
  6. remove_columns.py can then be used to remove undesired descriptors (e.g., '_stdev' or '_range' values) from the final output .xlsx file. The TSRC, Int, and TSRE features can also be combined into one spreadsheet with combine_features.py.

As an illustrative example, output .xlsx and .csv files for the featurization of the TSRE structures of reaction H, and corresponding .xyz files, are provided in the folder "Liu_JACS_2024_TSRE_Get_Properties" .

Step 8: Multivariate Linear Regression Modeling

The folder "Step_8_Modeling" contains the Python script nested_CV.py used for feature selection, which was performed via repeated, stratified, nested k-fold cross-validation. This script was written based on the code by Doyle et al.4 available elsewhere and requires a .xlsx input file with the following columns: "Structure", "Class", "ee", and "DeltaDeltaG", followed by all the parameters (see the "Benchmarking" folder for examples).

Details of the cross-validation scheme are provided in the Supporting Information. Notably, the exhaustive combinatorial search of features performed in the inner loop is computationally very expensive. To reduce the number of input features (and hence the cost), the Boruta algorithm13 implemented in a Jupyter notebook (feature_curation.ipynb) written by our group and hosted on GitHub was used. Alternatively, a modified version of nested_CV.py (called RF_nested_CV.py, available in the folder "Step_8_Modeling") was used. Herein, inside the exhaustive_search function, a random forest is trained to estimate feature importance. Features with importance ≤ 0.01 are filtered out, and the combinatorial search is performed on the remaining features. Models were also searched via bidirectional stepwise selection using our groups's modeling workflow (the Mattlab_modeling_v6.0.0.ipynb Jupyter notebook hosted on GitHub).

As illustrative examples, the Case_Study_2_CH_Functionalization.xlsx and Case_Study_2_Ni_XECs.xlsx files are provided in the "Case_Study_2" folder, each containing an "Input" sheet with the data needed to run the nested_CV.py script; the "Metadata" sheet contains additional metadata. The script may be submitted as follows:

python nested_CV.py input.xlsx [oos.xlsx]

where oos.xlsx is an optional spreadsheet containing the data for out-of-sample predictions (e.g., see the "OOS_Rxn_Metadata" sheet in the Case_Study_2_Ni_XECs.xlsx file; the oos.xlsx spreadsheet must contain the following columns: "Structure", "Class", "ee", and "DeltaDeltaG", followed by all the parameters).

Combinations of features identified with these approaches were further evaluated via a stratified 5×2 cross-validation scheme performed with the 5_2_CV.py Python script, which was also written based on the code by Doyle et al.4 This script requires a .xlsx file with the data ("Class", ∆∆G values, and descriptors) and a .txt file with the combinations of features to evaluate as input (e.g., TSRE_xTB: η_Boltz, %Vbur_C1s_2.0Å_Boltz, %Vbur_C2s_4.0Å_Boltz if the sheet containing parameters extracted from TSRE is called TSRE_xTB).

The script 5_2_CV_ensemble.py was used to post-process the output of the repeated, stratified, nested k-fold CV script and calculate ensemble predictions. This script may be executed as:

python 5_2_CV_ensemble.py file.out/xlsx Features.xlsx OOS.xlsx Output.xlsx

where file.out or file.xlsx contains the different MLR models (i.e., combinations of features) to evaluate in the 5×2 CV test, Features.xlsx contains the measured ∆∆G values and the descriptors (same format as in "Benchmarking"), and OOS.xlsx contains the data for out-of-sample predictions. As illustrative examples, CH_Activation.xlsx and Ni_XECs.out are provided in the "Case_Study_2" folder, together with the .xlsx files with all the features.

SISSO-augmented features generation

The Python code SISSO_feature_generation.py was used to calculate and filter the augmented features (see the Supporting Information for further details). It was written based on code priviously written by our group.14 A .xlsxfile with the same format as the input for nested_CV.py or 5_2_CV.py is required.

calculate_SISSO_features.py may be used to calculate the same SISSO-augmented features as those provided in a first input .xlsx file for a new set of parameters provided in a second .xlsx input (e.g., for virtual screening).

Step 9: Active Learning

The Case_Study_3.ipynb Jupyter notebook (located in the "Step_9_Active_Learning" folder) was used in the active learning campaign within the EDBO+ platform15 (installation instructions are provided elsewhere). CCs_Acetals.xlsx contains the input data required (in the EDBO_Input sheet), along with metadata regarding the ligand and substrates SMILES (Metadata sheet).

References

  1. Ingman, V. M., Schaefer, A. J., Andreola, L. R. & Wheeler, S. E. QChASM: Quantum chemistry automation and structure manipulation. WIREs Comput. Mol. Sci. 11, e1510 (2021).
  2. Sobez, J.-G. & Reiher, M. Molassembler: Molecular Graph Construction, Modification, and Conformer Generation for Inorganic and Organic Molecules. J. Chem. Inf. Model. 60, 3884–3900 (2020).
  3. Laplaza, R.; Wodrich, M. D.; Corminboeuf, C. Overcoming the Pitfalls of Computing Reaction Selectivity from Ensembles of Transition States. J. Phys. Chem. Lett. 15, 7363–7370 (2024).
  4. Lau, S. H.; Borden, M. A.; Steiman, T. J.; Wang, L. S.; Parasram, M.; Doyle, A. G. Ni/Photoredox-Catalyzed Enantioselective Cross-Electrophile Coupling of Styrene Oxides with Aryl Iodides. J. Am. Chem. Soc. 143, 15873–15881 (2021).
  5. Turro, R. F.; Wahlman, J. L. H.; Tong, Z. J.; Chen, X.; Yang, M.; Chen, E. P.; Hong, X.; Hadt, R. G.; Houk, K. N.; Yang, Y.-F.; Reisman, S. E. Mechanistic Investigation of Ni-Catalyzed Reductive Cross-Coupling of Alkenyl and Benzyl Electrophiles. J. Am. Chem. Soc. 145, 14705–14715 (2023).
  6. Laplaza, R., Sobez, J.-G., Wodrich, M. D., Reiher, M. & Corminboeuf, C. The (not so) simple prediction of enantioselectivity – a pipeline for high-fidelity computations. Chem. Sci. 13, 6858–6864 (2022).
  7. Frisch, M.; Trucks, G.; Schlegel, H.; Scuseria, G.; Robb, M.; Cheeseman, J.; Montgomery, J.; Vreven, T.; Kudin, K.; Burant, J.; Millam, J.; Iyengar, S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J.; Hratchian, H.; Cross, J.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R.; Yazyev, O.; Austin, A.; Cammi, R.; Pomelli, C.; Ochterski, J.; Ayala, P.; Morokuma, K.; Voth, G.; Salvador, P.; Dannenberg, J.; Zakrzewski, V.; Dapprich, S.; Daniels, A.; Strain, M.; Farkas, O.; Malick, D.; Rabuck, A.; Raghavachari, K.; Foresman, J.; Ortiz, J.; Cui, Q.; Baboul, A.; Clifford, S.; Cioslowski, J.; Stefanov, B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R.; Fox, D.; Keith, T.; Laham, A.; Peng, C.; Nanayakkara, A.; Challacombe, M.; Gill, P.; Johnson, B.; Chen, W.; Wong, M.; Gonzalez, C.; Pople, J. Gaussian 16, Revision C.01 (2016).
  8. Neugebauer, H.; Badorf, B.; Ehlert, S.; Hansen, A.; Grimme, S. High-Throughput Screening of Spin States for Transition Metal Complexes with Spin-Polarized Extended Tight-Binding Methods. J. Comput. Chem. 44, 2120–2129 (2023).
  9. Neese, F. Software Update: The ORCA Program System-Version 5.0. WIREs Comput. Mol. Sci. 12 (2022).
  10. Müller, M.; Hansen, A.; Grimme, S. ωB97X-3c: A Composite Range-Separated Hybrid DFT Method with a Molecule-Optimized Polarized Valence Double-ζ Basis Set. J. Chem. Phys. 158, 014103 (2023).
  11. Haas, B. C.; Hardy, M. A.; Sowndarya S. V., S.; Adams, K.; Coley, C. W.; Paton, R. S.; Sigman, M. S. Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines. Digit. Discov. 4, 222–233 (2025).
  12. Pollice, R.; Chen, P. A universal quantitative descriptor of the dispersion interaction potential. Angew. Chem. Int. Ed., 58, 9758–9769 (2019).
  13. Kursa, M. & Rudnicki, W. Feature Selection with the Boruta Package. J. Stat. Softw. 36, 1–13 (2010).
  14. Souza, L. W.; Miller, B. R.; Cammarota, R. C.; Lo, A.; Lopez, I.; Shiue, Y.-S.; Bergstrom, B. D.; Dishman, S. N.; Fettinger, J. C.; Sigman, M. S.; Shaw, J. T. Deconvoluting Nonlinear Catalyst–Substrate Effects in the Intramolecular Dirhodium-Catalyzed C–H Insertion of Donor/Donor Carbenes Using Data Science Tools. ACS Catal. 14, 104–115 (2024).
  15. Torres J. A. G.; Lau S. H.; Anchuri P.; Stevens J. M.; Tabora J. E.; Li J.; Borovika A.; Adams R. P.; Doyle A. G. A Multi-Objective Active Learning Platform and Web App for Reaction Optimization. J. Am. Chem. Soc. 144, 19999-20007 (2022).

About

This repository contains scripts for high-throughput generation, optimization, and featurization of TSs and catalytic cycle intermediates, along with scripts for MLR and active learning modeling and Excel spreadsheets with input data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors