https://doi.org/10.26434/chemrxiv-2025-2xknx
-
Mostafa Javaheri Moghadam PhD Candidate, Department of Chemistry University of New Brunswick, Fredericton, Canada ORCID: https://orcid.org/0009-0002-1415-9867
-
Rebecca Mulder Department of Chemistry University of New Brunswick, Fredericton, Canada
-
Stijn De Baerdemacker (Corresponding Author) Associate Professor, Department of Chemistry University of New Brunswick, Fredericton, Canada Email: stijn.debaerdemacker@unb.ca ORCID: https://orcid.org/0000-0001-7933-3227
Dataset and Workflow Files for Insulin AMI/FMI Analysis
This repository contains all fragment-level and full-system data used in the divide-and-correlate reconstruction of Atomic Mutual Information (AMI) and Fragment Mutual Information (FMI) for the insulin protein. The dataset includes spherical fragment structures, quantum-chemical inputs and outputs, mutual information (MI) matrices, and stitched reconstructions used in the manuscript.
Specifically, the repository provides:
Spherical fragment structures with capping (XYZ files) and without capping (PDB files)
- Lists of amino acids included in each spherical fragment together with the number of atoms per residue, stored as {'RESNAMEresID': number_of_atoms}
- RMSD values of each spherical fragment after position-restrained energy minimization relative to the full protein
- Fragment AMI matrices
- Stitched AMI/FMI matrices for each radius
- Full-protein structures and reference AMI/FMI matrices
- ORCA DFT input and output files are available at https://doi.org/10.25545/OZX7DP
All spherical fragments were capped with NH₂/COOH groups and subjected to position-restrained energy minimization using the OPLS-AA force field and the steepest-descent algorithm implemented in GROMACS to remove steric clashes at the cut boundaries while preserving the native protein geometry. DFT single-point calculations were performed using ORCA, and reduced density matrices (RDMs) were subsequently computed using in-house software.
📦 insulin
├─ full_protein
│ ├─ full_protein.pdb
│ ├─ full_protein.xyz
│ ├─ full_protein_AMI.csv
│ ├─ full_protein_FMI.csv
└─ stitched
├─ radius4
│ ├─ aa_list_r4.txt
│ ├─ rmsd_r4.csv
│ ├─ stitched_AMI_matrix.csv
│ ├─ stitched_FMI_matrix.csv
│ ├─ sphere01
│ │ ├─ sphere01_AMI_cap.csv
│ │ ├─ sphere01_AMI_nocap.csv
│ │ ├─ sphere01_cap.xyz
│ │ └─ sphere01_nocap.pdb
│ ├─ .
│ ├─ .
│ ├─ .
│ └─ sphere51
├─ radius5
│ ├─ aa_list_r5.txt
│ ├─ rmsd_r5.csv
│ ├─ stitched_AMI_matrix.csv
│ ├─ stitched_FMI_matrix.csv
│ ├─ sphere01
│ │ ├─ sphere01_AMI_cap.csv
│ │ ├─ sphere01_AMI_nocap.csv
│ │ ├─ sphere01_cap.xyz
│ │ └─ sphere01_nocap.pdb
│ ├─ .
│ ├─ .
│ ├─ .
│ └─ sphere51
└─ radius6
├─ aa_list_r6.txt
├─ rmsd_r6.csv
├─ stitched_AMI_matrix.csv
├─ stitched_FMI_matrix.csv
├─ sphere01
│ ├─ sphere01_AMI_cap.csv
│ ├─ sphere01_AMI_nocap.csv
│ ├─ sphere01_cap.xyz
│ └─ sphere01_nocap.pdb
├─ .
├─ .
├─ .
└─ sphere51
©generated by Project Tree Generator
- Generate overlapping spherical fragments centered on Cα atoms (r = 4, 5, 6 Å).
- Add capping groups (NH₂/COOH) at cut boundaries.
- Perform position-restrained energy minimization using GROMACS.
- Run ORCA DFT single-point calculations (ωB97M-V/6-31G(d)) for each fragment and for the full protein.
- Extract Kohn–Sham wavefunctions.
- Compute RDMs and mutual information.
- Construct AMI matrices for each fragment.
- Remove capping atoms and stitch overlapping fragments to reconstruct protein-scale AMI.
- Derive FMI matrices from the stitched AMI data.
- M. J. Moghadam, K. Boguslawski, R. Doucet, Ö. Legeza, P. Tecmer and S. De Baerdemacker, chemrxiv., 2024.
- M. J. Moghadam, K. Boguslawski, P. Tecmer, S. De Baerdemacker, chemrxiv., 2025
- M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess and E. Lindahl, Softw. X, 2015, 1, 19–25.
- F. Neese, WIREs Comput. Mol. Sci., 2012, 2, 73–78.