This repository contains scripts for high-throughput generation, optimization, and featurization of TSs and catalytic cycle intermediates, along with scripts for MLR and active learning modeling and the Excel spreadsheets with input data. This workflow refers to this paper published in Nature (DOI 10.1038/s41586-026-10239-7).
This workflow relies on published computational tools, including AaronTools,1 Molassembler,2 and mARC.3 See the links for detailed installation guides.
The "NN_Ligands" folder contains a .cdxml file with 2D representations of all the ligands investigated in this paper. Their corresponding SMILES strings are listed in NN_Ligands.csv, while the .xyz geometries used as input for AaronTools are in XYZ_Structures.zip.
Template geometries for IntC_R.xyz/IntC_S.xyz and TSRE_R.xyz/TSRE_S.xyz were taken from Doyle et al.4 and Reisman et al.5 and are located in the "Step_1_Templates_Generation" folder. See Step 3 for TSRC structures generation.
The following must be specified in a .bashrc file (or similar):
export PYTHONPATH=/home/$USER/AARON_TOOLS/:$PYTHONPATH
export PATH=$PATH:/home/$USER/AARON_TOOLS/AaronTools/bin
A Bash script like this may then be used to execute the mapLigand.py AaronTools script and iterate through a .txt file containing the names of the ligands to map onto the templates (here, IntC_R.xyz and IntC_S.xyz):
#!/bin/bash
# Check if a filename argument is provided
if [ -z "$1" ]; then
echo "Usage: $0 filename.txt"
exit 1
fi
# Loop through each line in the provided text file
while IFS= read -r file_id
do
mapLigand.py IntC_R.xyz -l 45,46="$file_id" -o "${file_id}_IntC_R.xyz"
mapLigand.py IntC_S.xyz -l 45,46="$file_id" -o "${file_id}_IntC_S.xyz"
done < "$1"
.xyz files of the ligands are located in /AARON_TOOLS/AaronTools/Ligands, while IntC_R.xyz and IntC_R.xyz are pre-optimized templates (at a DFT level of theory or at the GFN2-xTB level) of the two diastereometic L*Ni(III)(C-sp2)(C-sp3)X intermediates. The script can be adapted to generate templates for TSRE_R.xyz/TSRE_S.xyz, or to modify the substrates using the substitute.py script of AaronTools. Note that, since PyOx and PyIm ligands are asymmetric, to generate structures with those ligand classes two .xyz files for each ligand are included in /AARON_TOOLS/AaronTools/Ligands, with the indices of the chelating N atoms (N1 and N2) swapped (e.g., L0079_CfA.xyz and L0079_CfB.xyz with N1 = 1, N2 = 2 and vice versa).
Structures generated by AaronTools should be manually checked (with Jmol, Molden, or any other visualization software) to ensure that no clashes between atoms exist, and then optimized at the GFN2-xTB level with key bonds and angles being constrained (details are provided elsewhere). To ensure that the atom indices of the common chemical motif are conserved throughout all the structures generated by AaronTools, the following Bash script is used to re-order the atom indices prior to optimization (list.txt is a text file containing the atom indices of the structures generated by AaronTools ordered according to the desired output e.g., if the atom labelled 30 should become 1, then it must be listed first in list.txt):
#!/bin/bash
# Validate input arguments
if [ "$#" -ne 1 ]; then
echo "Usage: $0 path/to/list.txt"
exit 1
fi
list_file="$1"
# Process each .xyz file in the directory
for file in *.xyz; do
echo "Processing $file..."
# Create a temporary file for the reordered content
temp_file="${file%.xyz}_reordered.xyz"
# Read the first two lines and write them to the temporary file
head -n 2 "$file" > "$temp_file"
# Read the reordering indices from list.txt and adjust by adding 2
mapfile -t order < <(awk '{print $1 + 2}' "$list_file")
# Create an associative array to mark lines that need reordering
declare -A is_reordered
for index in "${order[@]}"; do
is_reordered[$index]=1
done
# Append reordered lines as per the list
for line_number in "${order[@]}"; do
sed -n "${line_number}p" "$file"
done >> "$temp_file"
# Calculate total lines in the file
total_lines=$(wc -l < "$file")
# Append remaining lines that are not reordered
for ((i=3; i<=total_lines; i++)); do
if [[ -z ${is_reordered[$i]} ]]; then
sed -n "${i}p" "$file"
fi
done >> "$temp_file"
# Replace the original file with the reordered file
mv "$temp_file" "$file"
done
echo "All files have been processed."
The following Bash script is used to iterate through a text file containing the names of the ligands in the library. It calls a Python script (conformer_generator.py) to run Molassembler (provided in the folder "Step_2_Conformational_Sampling"). This script was adapted from code written by Dr Rubén Laplaza (LCMD, EPFL) and has been published.6
conformer_generator.py may be adapted to change the maximum number of conformers generated (max_n_confs, 250 has been used as default) and the indices of the Ni and Br atoms (return list(set(shells + list(range(20,21))))). The radius_adjacency function may also be adapted depending on the bond lengths in the Ni atom first coordination sphere (in the initial .xyz templates).
#!/bin/bash
# Check if a filename argument is provided
if [ -z "$1" ]; then
echo "Usage: $0 filename.txt"
exit 1
fi
# Loop through each line in the provided text file
while IFS= read -r file_id
do
python conformer_generator.py "${file_id}_TSRE_R.xyz"
python conformer_generator.py "${file_id}_TSRE_S.xyz"
done < "$1"
All the .xyz files generated by Molassembler are optimized at the GFN2-xTB level with key bonds and angles being constrained (specifically, the Ni–C(sp2), Ni–C(sp3), Ni–Br, C(sp2)–C(sp3), and C(sp3)–H bonds). Since thousands of structures are generated in Step 2, to optimize computational resources geometry optimizations are submitted as a GNU parallel job. An example Bash script is shown below (52 CPUs, 1 CPU per task).
#!/bin/bash
#SBATCH --partition=sigman-shared-np
#SBATCH --account=sigman-np
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=52
hostname
env | grep SLURM
cat TSRE.txt | awk '{print $1}' | parallel -j $SLURM_NTASKS ./xtb_batch
TSRE.txt is a text file listing all the .xyz structures to optimize. This script calls the GFN2-xTB submission script, shown below:
#!/bin/bash
module load xtb/6.6.1
file=$1
ulimit -s unlimited
export OMP_STACKSIZE=30GB
export OMP_NUM_THREADS=16,1
export OMP_MAX_ACTIVE_LEVELS=1
export MKL_NUM_THREADS=16
WORKDIR=$PWD
mkdir $WORKDIR/${file%".xyz"}_output
cp $WORKDIR/${file} $WORKDIR/${file%".xyz"}_output/
cp $WORKDIR/constraints.inp $WORKDIR/${file%".xyz"}_output/
cd $WORKDIR/${file%".xyz"}_output/
xtb --input constraints.inp ${file} --opt --charge 0 --uhf 1
cp xtbopt.xyz $WORKDIR/${file%".xyz"}_opt.xyz
rm -rf $WORKDIR/${file%".xyz"}_output
Once optimization is complete, file_opt.xyz structures are renamed to file.xyz.
Molassembler may occasionally fail at conserving the (E)-stereochemistry of the substrate when a graph is projected back into 3D coordinates, incorrectly generating structures with (Z)-stereochemistry. To address this issue, for reactions C, E, and G max_n_confs inside conformer_generator.py was increased to 500 (rather than 250). Following constrained optimization of all the structures, the Python script classify_dihedral.py was used to filter out geometries where the alkene was in the incorrect (Z)-configuration, resulting in the desired ca. 250 structures (per diastereomeric species) with the (E)-configuration.
The following Bash script is used to generate .com files for all the optimized file.xyz structures. This script takes as input the name of the directory where the .xyz files are located, and requires the template file gauss_input.com to be located in the working directory.
mkdir $1_gauss_comps
for mol in `ls $1`
do
echo $1/$mol
tail -n +3 $1/$mol > file
sed -e "/0 2 here/r file" gauss_input.com > ./$1_gauss_comps/${mol%".xyz"}.com
sed -i "s/file_name/${mol%".xyz"}/g" ./$1_gauss_comps/${mol%".xyz"}.com
sed -i "s/here//g" ./$1_gauss_comps/${mol%".xyz"}.com
rm file
done
An example of gauss_input.com for the optimization of the reductive elimination TSs with Gaussian167 at the spGNF2-xTB level8 is shown below. This input calls the external keyword implemented in Gaussian, which requests a calculation using an external program. A Perl wrapper script (xtb-gaussian) to make the xtb binary usable with the external command was developed by the group of Alán Aspuru-Guzik and is hosted on GitHub (it is also available in the folder "Step_4_Full_Optimization").
%chk=file_name.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
opt=(calcfc,AddRedun,ts,noeigentest,nolinear,maxstep=5,MaxCycles=200,nomicro,recalcfc=25) freq=noraman
file_name
0 2 here
1 7 B
1 21 B
7 21 B
Similarly to Step 3, computations are submitted as a GNU parallel job:
#!/bin/bash
#SBATCH --partition=sigman-shared-np
#SBATCH --account=sigman-np
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=52
hostname
env | grep SLURM
cat TSRE.txt | awk '{print $1}' | parallel -j $SLURM_NTASKS ./xtb_g16_batch
This script calls a Gaussian submission script (where xtb is used as external program), shown below:
#!/bin/bash
export PATH=$PATH:/uufs/chpc.utah.edu/common/home/u6055669/software/xtb-gaussian:/uufs/chpc.utah.edu/sys/installdir/r8/xtb/6.6.1/bin/
function is_bin_in_path {
builtin type -P "$1" &> /dev/null
}
is_bin_in_path xtb && echo "Found xtb." || echo "No xtb found. Exit!"
is_bin_in_path xtb-gaussian && echo "Found xtb-gaussian." || echo "No xtb-gaussian found. Exit!"
ulimit -s unlimited
export OMP_STACKSIZE=700M
export OMP_NUM_THREADS=8,1
export MKL_NUM_THREADS=8
module purge
module load gaussian16
module load intel intel-mkl
module load xtb/6.6.1
file=$1
WORKDIR=$PWD
mkdir $WORKDIR/${file%".com"}_output/
export TMPDIR=$WORKDIR/${file%".com"}_output/
export GAUSS_SCRDIR=${TMPDIR}
cp $WORKDIR/${file} $TMPDIR/
cd $TMPDIR
echo ${file}
g16 < ${file} > ${file%".com"}.log
cp $TMPDIR/${file%".com"}.log $WORKDIR
rm -rf $TMPDIR
Note that the script requires an existing xtb installation; the xtb binary, as well as the xtb-gaussian script, have to be available in the PATH.
Molassembler may fail at projecting back into 3D coordinates structures that are not interpreted as a single graph, such as the radical capture transition state (due to the Ni–C(sp3) bond being formed). To address this limitation, a relaxed PES scan is performed for every conformer of the L*Ni(III)(C-sp2)(C-sp3)X intermediate (Int_R and Int_S) generated by Molassembler (and optimized according to the procedure described in Step 3), elongating the Ni–C(sp3) bond. The GFN2-xTB submission script is shown below:
#!/bin/bash
module load xtb/6.6.1
file=$1
ulimit -s unlimited
export OMP_STACKSIZE=30GB
export OMP_NUM_THREADS=16,1
export OMP_MAX_ACTIVE_LEVELS=1
export MKL_NUM_THREADS=16
WORKDIR=$PWD
mkdir $WORKDIR/${file%".xyz"}_output
cp $WORKDIR/${file} $WORKDIR/${file%".xyz"}_output/
cp $WORKDIR/scan.inp $WORKDIR/${file%".xyz"}_output/
cd $WORKDIR/${file%".xyz"}_output/
xtb --input scan.inp ${file} --opt --charge 0 --uhf 1
cp xtbopt.xyz $WORKDIR/${file%".xyz"}_opt.xyz
cp xtbscan.log $WORKDIR/${file%".xyz"}_scan.log
rm -rf $WORKDIR/${file%".xyz"}_output
The input and output (i.e., the end point of the scan) geometries are then used to locate the radical capture transition state with the Synchronous Transit-Guided Quasi-Newton (STQN) Method (QST2). An example of the corresponding gauss_input.com file is shown below:
%chk=QST2.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
opt=(qst2,calcfc,NoEigenTest,Loose,nolinear,maxstep=5,MaxCycles=150,nomicro,AddRedun,recalcfc=5) freq=noraman
Reactant
0 2 here1
16 49 B
1 49 48 A
Product
0 2 here2
16 49 B
1 49 48 A
Following the full geometry optimization, failed .log files (and their corresponding .com files) are filtered out using the following Bash script:
for file in *.log
do
grep -v '^$' < "${file}" | tail -1 | grep "Normal termination" > /dev/null; result=${?}
if [ "${result}" -ne 0 ]
then
declare job="${file%.log}"
echo "${job}.log"
mv "${job}.log" FAILED
echo "${job}.com"
mv "${job}.com" FAILED
fi
done
Coordinates are then extracted from the converged .log files using the printXYZ.py script of AaronTools. The Python script filter_IRC.py (available in the "Step_5_Filtering" folder) is then used to separate structures based on a specified bond (e.g., the C(sp2)–C(sp3) bond, or the Ni–C(sp3) bond) length threshold. This is important to seprate e.g., the L*Ni(III)(C-sp2)(C-sp3)X intermediates from structures that erroneously optimized to the post-reductive elimination complex.
To filter out structures that have not optimized to a first-order saddle point, the g16-ifreq.py Python script is executed. The script filters out the .log files with multiple imaginary frequencies or an imaginary frequency outside a user-defined range. Based on the manual inspection of multiple TS conformers for Case Study 1, the range specified for TSRC was -110 cm-1 < freq < -25 cm-1, while for TSRE it was -230 cm-1 < freq < -35 cm-1.
The following Bash script is then used to run mARC3 (version 0.1.10) and select unique conformers within a 10 kcal/mol energetic window (based on the spGFN2-xTB electronic energies). Note that, to have all the .xyz files in the right format for mARC, the Python scripts get_xTB_E.py and add_energies_xlsx.py (see the "Step_5_Filtering" folder) are run prior to mARC.
#!/bin/bash
# Check if a filename argument is provided
if [ -z "$1" ]; then
echo "Usage: $0 filename.txt"
exit 1
fi
# Loop through each line in the provided text file
while IFS= read -r file_id
do
python -m navicat_marc -i "${file_id}_TSRE_R"*.xyz -m rmsd -ewin 10 -mine
python -m navicat_marc -i "${file_id}_TSRE_S"*.xyz -m rmsd -ewin 10 -mine
done < "$1"
Finally, the thermal correction to the Gibbs Free energy is extracted from the .log files selected by mARC with the get_Gibbs_corr.py Python script.
.inp files for Orca 5.09 are generated with the following Bash script, which calls the o4wb3c Fortran executable:
!/bin/bash
file=$1
module load gcc/8.5.0
module load openmpi/4.1.1
module load orca/5.0.3
WORKDIR=$PWD
mkdir $WORKDIR/${file%".xyz"}_input
cp $WORKDIR/${file} $WORKDIR/${file%".xyz"}_input/
cd $WORKDIR/${file%".xyz"}_input/
o4wb3c --struc ${file} --charge 0 --uhf 1
cp wb97x3c.inp $WORKDIR/${file%".xyz"}.inp
cd $WORKDIR
rm -rf $WORKDIR/${file%".xyz"}_input
sed -i 's/wB97X-D4/wB97X-D4 miniprint nopop/g' ${file%".xyz"}.inp
sed -i 's/nprocs 4/nprocs 16/g' ${file%".xyz"}.inp
sed -i '/%pal/i %cpcm\n epsilon 2.25\n refrac 1.4224\nend\n' ${file%".xyz"}.inp
The value of the dielectric constant and of the refractive index should be adjusted depending on the reaction's solvent.
The PCM/ωB97X-3c10 energies are extracted from the .out files with the get_SPC.py Python script (available in the "Step_6_SPC" folder). Gibbs free energies (G(T)_spc(Hartree)) are then calculated as the sum of the PCM/ωB97X-3c electronic energies and the spGFN2-XTB–level thermal corrections.
The following gauss_input.com template is used to generate input files for the computation of molecular descriptors, which, for computational efficiency, are performed with the same xtb-Gaussian approach as Step 4. The solvent should be adapted based on the reaction.
%chk=file_name.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
freq=noraman
file_name
0 2 here
--Link1--
%chk=file_name.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
geom=check pm6 scf=yqc scrf=(pcm,solvent=n,n-DiMethylAcetamide)
file_name
0 2
--Link1--
%chk=file_name.chk
# external="xtb-gaussian -P 10 --charge 0 --uhf 1 --spinpol --tblite"
geom=check guess=read volume polar pop=hirshfeld prop=efg scrf=(pcm,solvent=n,n-DiMethylAcetamide)
file_name
0 2
The get_properties_HT_Worflow_spGFN2-xTB.ipynb Jupyter notebook (located in the "Step_7_Get_Properties" folder) is used to collect most of the descriptors. It was adapted from work published by Coley, Paton, Sigman et al.11 hosted on GitHub. Please see the link for detailed instructions.
The GetParameters.py Python script is also used to extract molecular features (specifically, dispersion descriptors,12 spin densities, and atomic solvent accessible surface areas) using ᴍᴏʀғᴇᴜs that are not implemented in the Get Properties notebook. All the relevant Bash and Python scripts are localed in the the "Step_7_Get_Properties" folder. The workflow is exectuted as follows:
- Run
prepare_Morfeus.shto separate all the.log(and their corresponding.xyzfiles) into distinct directories based on the template (i.e., the name of the ligand); - Create an
Atom_indices.csvfile, containing the indices of atoms called N1, N2, C1, C2, C4, C5, R1, R2, X1, X2, Ni, Br, C1s, C2s, H1s (see Figure S2 of the ESI) for each template.Atom_indices.csvcan also be used together withatom_map.pyto create the input.xlsxfile forget_properties_HT_Worflow_spGFN2-xTB.ipynb; - Execute
run_morfeus_script.sh: it callsChange_labels.pyto update the atom indices (based on what is listed inAtom_indices.csv) insideGetParameter.py. GetParameter_{template}.pyscripts are generated and executed inside the directory corresponding to each template.combine_morfeus_results.pycan then be used to combine the separate output.csvfiles into one.xlsxfile (e.g.,TSRE_Morfeus_Results.xlsx);- Parametes collected with
GetParameter.pycan then be combined with those collected viaget_properties_HT_Worflow_spGFN2-xTB.ipynband post-processed inside the notebook (remember to add G(T)_spc(Hartree) calculated in Step 6); remove_columns.pycan then be used to remove undesired descriptors (e.g., '_stdev' or '_range' values) from the final output.xlsxfile. The TSRC, Int, and TSRE features can also be combined into one spreadsheet withcombine_features.py.
As an illustrative example, output .xlsx and .csv files for the featurization of the TSRE structures of reaction H, and corresponding .xyz files, are provided in the folder "Liu_JACS_2024_TSRE_Get_Properties" .
The folder "Step_8_Modeling" contains the Python script nested_CV.py used for feature selection, which was performed via repeated, stratified, nested k-fold cross-validation. This script was written based on the code by Doyle et al.4 available elsewhere and requires a .xlsx input file with the following columns: "Structure", "Class", "ee", and "DeltaDeltaG", followed by all the parameters (see the "Benchmarking" folder for examples).
Details of the cross-validation scheme are provided in the Supporting Information. Notably, the exhaustive combinatorial search of features performed in the inner loop is computationally very expensive. To reduce the number of input features (and hence the cost), the Boruta algorithm13 implemented in a Jupyter notebook (feature_curation.ipynb) written by our group and hosted on GitHub was used. Alternatively, a modified version of nested_CV.py (called RF_nested_CV.py, available in the folder "Step_8_Modeling") was used. Herein, inside the exhaustive_search function, a random forest is trained to estimate feature importance. Features with importance ≤ 0.01 are filtered out, and the combinatorial search is performed on the remaining features. Models were also searched via bidirectional stepwise selection using our groups's modeling workflow (the Mattlab_modeling_v6.0.0.ipynb Jupyter notebook hosted on GitHub).
As illustrative examples, the Case_Study_2_CH_Functionalization.xlsx and Case_Study_2_Ni_XECs.xlsx files are provided in the "Case_Study_2" folder, each containing an "Input" sheet with the data needed to run the nested_CV.py script; the "Metadata" sheet contains additional metadata. The script may be submitted as follows:
python nested_CV.py input.xlsx [oos.xlsx]
where oos.xlsx is an optional spreadsheet containing the data for out-of-sample predictions (e.g., see the "OOS_Rxn_Metadata" sheet in the Case_Study_2_Ni_XECs.xlsx file; the oos.xlsx spreadsheet must contain the following columns: "Structure", "Class", "ee", and "DeltaDeltaG", followed by all the parameters).
Combinations of features identified with these approaches were further evaluated via a stratified 5×2 cross-validation scheme performed with the 5_2_CV.py Python script, which was also written based on the code by Doyle et al.4 This script requires a .xlsx file with the data ("Class", ∆∆G‡ values, and descriptors) and a .txt file with the combinations of features to evaluate as input (e.g., TSRE_xTB: η_Boltz, %Vbur_C1s_2.0Å_Boltz, %Vbur_C2s_4.0Å_Boltz if the sheet containing parameters extracted from TSRE is called TSRE_xTB).
The script 5_2_CV_ensemble.py was used to post-process the output of the repeated, stratified, nested k-fold CV script and calculate ensemble predictions. This script may be executed as:
python 5_2_CV_ensemble.py file.out/xlsx Features.xlsx OOS.xlsx Output.xlsx
where file.out or file.xlsx contains the different MLR models (i.e., combinations of features) to evaluate in the 5×2 CV test, Features.xlsx contains the measured ∆∆G‡ values and the descriptors (same format as in "Benchmarking"), and OOS.xlsx contains the data for out-of-sample predictions. As illustrative examples, CH_Activation.xlsx and Ni_XECs.out are provided in the "Case_Study_2" folder, together with the .xlsx files with all the features.
The Python code SISSO_feature_generation.py was used to calculate and filter the augmented features (see the Supporting Information for further details). It was written based on code priviously written by our group.14 A .xlsxfile with the same format as the input for nested_CV.py or 5_2_CV.py is required.
calculate_SISSO_features.py may be used to calculate the same SISSO-augmented features as those provided in a first input .xlsx file for a new set of parameters provided in a second .xlsx input (e.g., for virtual screening).
The Case_Study_3.ipynb Jupyter notebook (located in the "Step_9_Active_Learning" folder) was used in the active learning campaign within the EDBO+ platform15 (installation instructions are provided elsewhere). CCs_Acetals.xlsx contains the input data required (in the EDBO_Input sheet), along with metadata regarding the ligand and substrates SMILES (Metadata sheet).
- Ingman, V. M., Schaefer, A. J., Andreola, L. R. & Wheeler, S. E. QChASM: Quantum chemistry automation and structure manipulation. WIREs Comput. Mol. Sci. 11, e1510 (2021).
- Sobez, J.-G. & Reiher, M. Molassembler: Molecular Graph Construction, Modification, and Conformer Generation for Inorganic and Organic Molecules. J. Chem. Inf. Model. 60, 3884–3900 (2020).
- Laplaza, R.; Wodrich, M. D.; Corminboeuf, C. Overcoming the Pitfalls of Computing Reaction Selectivity from Ensembles of Transition States. J. Phys. Chem. Lett. 15, 7363–7370 (2024).
- Lau, S. H.; Borden, M. A.; Steiman, T. J.; Wang, L. S.; Parasram, M.; Doyle, A. G. Ni/Photoredox-Catalyzed Enantioselective Cross-Electrophile Coupling of Styrene Oxides with Aryl Iodides. J. Am. Chem. Soc. 143, 15873–15881 (2021).
- Turro, R. F.; Wahlman, J. L. H.; Tong, Z. J.; Chen, X.; Yang, M.; Chen, E. P.; Hong, X.; Hadt, R. G.; Houk, K. N.; Yang, Y.-F.; Reisman, S. E. Mechanistic Investigation of Ni-Catalyzed Reductive Cross-Coupling of Alkenyl and Benzyl Electrophiles. J. Am. Chem. Soc. 145, 14705–14715 (2023).
- Laplaza, R., Sobez, J.-G., Wodrich, M. D., Reiher, M. & Corminboeuf, C. The (not so) simple prediction of enantioselectivity – a pipeline for high-fidelity computations. Chem. Sci. 13, 6858–6864 (2022).
- Frisch, M.; Trucks, G.; Schlegel, H.; Scuseria, G.; Robb, M.; Cheeseman, J.; Montgomery, J.; Vreven, T.; Kudin, K.; Burant, J.; Millam, J.; Iyengar, S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J.; Hratchian, H.; Cross, J.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R.; Yazyev, O.; Austin, A.; Cammi, R.; Pomelli, C.; Ochterski, J.; Ayala, P.; Morokuma, K.; Voth, G.; Salvador, P.; Dannenberg, J.; Zakrzewski, V.; Dapprich, S.; Daniels, A.; Strain, M.; Farkas, O.; Malick, D.; Rabuck, A.; Raghavachari, K.; Foresman, J.; Ortiz, J.; Cui, Q.; Baboul, A.; Clifford, S.; Cioslowski, J.; Stefanov, B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R.; Fox, D.; Keith, T.; Laham, A.; Peng, C.; Nanayakkara, A.; Challacombe, M.; Gill, P.; Johnson, B.; Chen, W.; Wong, M.; Gonzalez, C.; Pople, J. Gaussian 16, Revision C.01 (2016).
- Neugebauer, H.; Badorf, B.; Ehlert, S.; Hansen, A.; Grimme, S. High-Throughput Screening of Spin States for Transition Metal Complexes with Spin-Polarized Extended Tight-Binding Methods. J. Comput. Chem. 44, 2120–2129 (2023).
- Neese, F. Software Update: The ORCA Program System-Version 5.0. WIREs Comput. Mol. Sci. 12 (2022).
- Müller, M.; Hansen, A.; Grimme, S. ωB97X-3c: A Composite Range-Separated Hybrid DFT Method with a Molecule-Optimized Polarized Valence Double-ζ Basis Set. J. Chem. Phys. 158, 014103 (2023).
- Haas, B. C.; Hardy, M. A.; Sowndarya S. V., S.; Adams, K.; Coley, C. W.; Paton, R. S.; Sigman, M. S. Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines. Digit. Discov. 4, 222–233 (2025).
- Pollice, R.; Chen, P. A universal quantitative descriptor of the dispersion interaction potential. Angew. Chem. Int. Ed., 58, 9758–9769 (2019).
- Kursa, M. & Rudnicki, W. Feature Selection with the Boruta Package. J. Stat. Softw. 36, 1–13 (2010).
- Souza, L. W.; Miller, B. R.; Cammarota, R. C.; Lo, A.; Lopez, I.; Shiue, Y.-S.; Bergstrom, B. D.; Dishman, S. N.; Fettinger, J. C.; Sigman, M. S.; Shaw, J. T. Deconvoluting Nonlinear Catalyst–Substrate Effects in the Intramolecular Dirhodium-Catalyzed C–H Insertion of Donor/Donor Carbenes Using Data Science Tools. ACS Catal. 14, 104–115 (2024).
- Torres J. A. G.; Lau S. H.; Anchuri P.; Stevens J. M.; Tabora J. E.; Li J.; Borovika A.; Adams R. P.; Doyle A. G. A Multi-Objective Active Learning Platform and Web App for Reaction Optimization. J. Am. Chem. Soc. 144, 19999-20007 (2022).