In 2024, the code for Kraken was refactored to streamline the addition of new monophosphines and improve portability. This update uses modern software versions including xTB 6.4.0 and CREST 2.12. In our testing, the new workflow performs nearly identically to the original with minor differences.
Different versions of xTB lead to small differences in GFN2-level properties. We compared DFT properties for more than 60 monophosphines generated using both the original and updated versions of Kraken. For a small number of properties (typically octant volumes), the difference between the old and new values for a given monophosphine exceeds 75% of the original value. These differences likely arise from randomness in the conformer search.
A detailed report comparing 28 monophosphines from the original Kraken workflow is provided in the validation folder
- Create a conda environment with the included environment YAML file.
conda env create --name kraken --file=kraken.yml- Activate the new environment
conda activate kraken-
(Optional) Download an appropriate version of xTB and CREST. The precompiled versions for xTB v6.6.0 and CREST v2.12 worked in our testing. Alternatively, CHPC users can use
module load crest/2.12to load xTB v6.4.0 and CREST v2.12. -
Place the precompiled binaries for xTB and CREST somewhere on your PATH. Kraken will call these through subprocesses.
-
Install the
krakenpackage by navigating to the parent directory containingsetup.pyand running this command:
pip install .These instructions are for Sigman group members to submit batches of calculations to the Sigman owner nodes on Notchpeak. For users outside of the Sigman group, see instructions in the next section.
-
Format a
.csvfile that contains the columns SMILES, KRAKEN_ID, CHARGE, CONVERSION_FLAG:KRAKEN_ID SMILES CHARGE CONVERSION_FLAG 5039 CP(C)C 0 4 10596 CP(C1=CC=CC=C1)C 0 4 ... ... ... ... -
Run the example submission script with your requested inputs and configurations:
submit_conf_search --csv input.csv --nprocs 8 --mem 16 --time 6 --calculation-dir ./data/ --debug-
After completion, inspect SLURM log files (
*.log,*.error) for errors/warnings (ERROR,WARNING). -
Use the CLI utilities provided by Kraken if you wish to run all of your calculations from one directory (recommended):
a. Move DFT files to a common directory:
extract_dft_files --input ./data/ --destination ./dft_calculation_folder_for_convenience/
b. Submit DFT calculations:
for i in *.com; do subg16 $i -c sigman -m 32 -p 16 -t 12; done
c. Return results to their directories:
return_dft_files --input ./dft_calculation_folder_for_convenience/ --destination ./data/
-
Evaluate DFT jobs for errors. For help, use GaussianLogfileAssessor.
-
Submit the DFT portion of the Kraken workflow:
submit_dft_calcs --csv input.csv --nprocs 8 --mem 16 --time 6 --calculation-dir ./data/ --debug-
Check SLURM
.logand.errorfiles and raise an issue on this repo if necessary. -
Convert the resulting YAML files into a convenient spreadsheet.
yaml_to_csv -d ./data/ --debugTo submit batches of calculations to different HPC systems, you must create a submission script template similar to those in
the kraken/slurm_templates directory to accommodate your job scheduler. Please note that special symbols exist in the SLURM
templates that are substituted with actual values required by the scheduler including $KID, $NPROCS, $MEM, and several others.
Specify the template file using the --slurm-template argument for the submission scripts.
-
Format a
.csvfile that contains the columns SMILES, KRAKEN_ID, CHARGE, CONVERSION_FLAG (see above) -
Run the example submission script with your requested inputs and configurations:
submit_conf_search --csv input.csv --nprocs 8 --mem 16 --time 6 --calculation-dir ./data/ --debug --slurm-template ~/path/to/template.slurm- Continue with steps 4-7 from the previous procedure.
Kraken can also be executed directly from the command line. This can be useful if you wish to create your own wrapper scripts for submission to other HPC systems. Please note that running this script will call computationally intensive programs and should not be run on head/login nodes.
-
Format a
.csvfile that contains the columns SMILES, KRAKEN_ID, CHARGE, CONVERSION_FLAG (see above) -
Run the first Kraken script on the
.csvfile.
run_kraken_conf_search -i ./data/input.csv --nprocs 4 --calculation-dir ./data/ --debug > kraken_conf_search.log-
After the script terminates, navigate to
./data/to find the conformer search directories. Each<KRAKEN_ID>/dft/folder contains the.comfiles for Gaussian16. Run these calculations and place all result files back in the<KRAKEN_ID>/dft/for your monophosphine. -
After confirming the
.log,.chk, and.wfnfiles are present in<KRAKEN_ID>/dft/, run the final Kraken DFT processing step. This step operates on individual Kraken IDs (CSV input is not supported):
run_kraken_dft.py --kid 90000001 --dir ./data/ --nprocs 4 --force > kraken_dft_processing_90000001.log- Final
.ymloutput files from both the CREST and DFT steps will be found in./data/<KRAKEN_ID>/:
./90000001/
├── 90000001_confdata.yml
├── 90000001_data.yml
├── 90000001_Ni_combined.yml
├── 90000001_Ni_confs.yml
├── 90000001_Ni.yml
├── 90000001_noNi_combined.yml
├── 90000001_noNi_confs.yml
├── 90000001_noNi.yml
├── 90000001_relative_energies.csv
├── crest_calculations
│ ├── 90000001_Ni
│ └── 90000001_noNi
├── dft
│ ├── 90000001_errors.txt
│ ├── 90000001_noNi_00000
│ ├── 90000001_noNi_00001
│ ├── ...
│ ├── confselection_minmax_Ni.txt
│ ├── confselection_minmax_noNi.txt
│ ├── rmsdmatrix.csv
│ └── selected_conformers
└── xtb_scr_dir
Please cite the original Kraken publication if you used this software. The executables for Multiwfn, dftd3, and dftd4 are included in this repository and are used in the Kraken workflow. Please cite the Multiwfn, dftd3, and dftd4 publications.
A Comprehensive Discovery Platform for Organophosphorus Ligands for Catalysis
Tobias Gensch, Gabriel dos Passos Gomes, Pascal Friederich, Ellyn Peters, Théophile Gaudin, Robert Pollice, Kjell Jorner, Akshat Kumar Nigam, Michael Lindner-D’Addario, Matthew S. Sigman, Alán Aspuru-Guzik.
J. Am. Chem. Soc. 2022 144 (3), 1205-1217. DOI: 10.1021/jacs.1c09718
Multiwfn: A Multifunctional Wavefunction Analyzer
Tian Lu, Feiwu Chen.
J. Comput. Chem., 2012 33, 580-592. DOI: 10.1002/jcc.22885
A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu
Stefan Grimme, Jens Antony, Stephan Ehrlich, Helge Krieg.
J. Chem. Phys. 2010, 132, 154104. DOI: 10.1063/1.3382344
Effect of the damping function in dispersion corrected density functional theory
Stefan Grimme, Stephan Ehrlich, Lars Goerigk.
J. Comput. Chem., 2011, 32, 1456-1465. DOI: 10.1002/jcc.21759
Extension of the D3 dispersion coefficient model
Eike Caldeweyher, Christoph Bannwarth, Stefan Grimme.
J. Chem. Phys., 2017, 147, 034112. DOI: 10.1063/1.4993215
A generally applicable atomic-charge dependent London dispersion correction
Eike Caldeweyher, Sebastian Ehlert, Andreas Hansen, Hagen Neugebauer, Sebastian Spicher, Christoph Bannwarth, Stefan Grimme.
J. Chem. Phys., 2019, 150, 154122. DOI: 10.1063/1.5090222
Extension and evaluation of the D4 London-dispersion model for periodic systems
Eike Caldeweyher, Jan-Michael Mewes, Sebastian Ehlert, Stefan Grimme.
Phys. Chem. Chem. Phys., 2020, 22, 8499-8512. DOI: 10.1039/D0CP00502A
- Kraken rejects SMILES strings with undefined stereochemistry that would lead to inconsistent results (diastereomers)
- Included CLI scripts for automatically submitting Kraken calculations
- Included CLI scripts for converting Kraken result directories into CSV files or SDF files for viewing
- Automatic fchk generation
- The conformer search produces slightly different conformers each run, so results vary slightly (around 1%) between runs
- The original code is designed to ignore descriptors that are assigned
Noneas a result of xTB failure. This behavior is retained. - Despite refactoring, the codebase still contains unused code.
- The conda versions of xTB and CREST are incompatible with Kraken. They frequently crashed during the --vipea calculations. The precompiled binaries of each release should be used or compiled directly. This workflow was developed with CREST 2.12 and xTB 6.4.0.
- CREST 2.12 produces many more conformers than CREST 2.8. Because conformers are selected based on properties, the number of conformers for DFT calculations should remain unchanged.
- Several descriptors vary substantially with xTB 6.7.0 or greater (EA/IP descriptors, nucleophilicity) because IPEA-xTB is not used for vertical IP/EA calculations. This will likely not affect the DFT level descriptors.
The code for this updated workflow was adapted from the original Kraken code. Updates to the code should be done carefully so as to not impact the final descriptors. We have included a comparison between the descriptors from the original Kraken publications and the new workflow for approximately 30 monophosphines in the validation/ folder.
If you wish to submit batches of Kraken calculations (either the conformer search or the DFT portion of the workflow) to other systems, you must
create additional .slurm templates that are compatible with /kraken/cli/submit_conf_search.py and /kraken/cli/submit_dft_calcs.py. The
slurm scripts should contain the call to run_kraken_conf_search and run_kraken_dft along with placeholders for the following variables.
$TIME - Time in hours for the jobs
$NPROCS - Number of processors to request for the job
$MEM - Amount of memory in Gigabytes to request for the job
$KID - 8-digit Kraken ID
$CALCDIR - Calculation directory for the job
$SMILES - Placeholder for the SMILES string of the monophosphines (only required for conf search portion)
$CONVERSION_FLAG - Flag for method for generating coordinates from SMILES (default should be 4, only for conf search portion)
Once you have created the new .slurm template, you can use the submission scripts (submit_conf_search.py) and specify the --slurm-template argument.