AMASE (Automated Mixture Analysis via Structural Evaluation) is a Python package for automatically assigning mixtures studied by rotational spectroscopy. It leverages graph analysis and machine-learning molecular embedding methods to consider how structurally/chemically similar molecular candidates are to previously observed mixture components or known chemical priors.
A paper describing the technique can be found here: https://pubs.acs.org/doi/10.1021/acs.jpca.4c03580
Note: This repository is for laboratory mixtures. If you're intested in assigning astronomical mixtures, see: https://github.com/zfried/astro_amase/tree/main
Recommended: Use Python 3.11 for best compatibility.
Note: AMASE has not yet been uploaded to PyPI. For now, you must install from source.
# Clone the repository
git clone https://github.com/zfried/AMASE/
cd AMASE
# Install
pip install .Creating a conda environment is optional but recommended to avoid dependency conflicts:
# Create and activate a new conda environment with Python 3.11
conda create -n amase_env python=3.11
conda activate amase_env
# Clone the repository
git clone https://github.com/zfried/AMASE/
cd AMASE
# Install AMASE
pip install .Note: RDKit can be difficult to install via pip. If you encounter issues, install it separately with conda first:
conda install -c conda-forge rdkit
pip install .To install in development/editable mode:
# Optional: Create and activate conda environment
conda create -n amase_env python=3.11
conda activate amase_env
# Clone and install
git clone https://github.com/zfried/AMASE/
cd AMASE
pip install -e .- Python 3.11 (recommended)
- Dependencies are automatically installed with pip
Before running AMASE, download all required data files from the Dropbox directory and place them in your directory_path. These are the files required for the graph calculations and the catalogs/metadata from CDMS/JPL.
For a comprehensive guide with all parameters and multiple examples, see example_run_assignment.ipynb
import amase
amase.run_assignment(
spectrum_path="/path/to/spectrum.txt",
directory_path="/path/to/output/directory",
sigma_threshold=5.0,
temperature=300.0
)import amase
import pandas as pd
# Example with all optional parameters
amase.run_assignment(
spectrum_path="/path/to/spectrum.txt",
directory_path="/path/to/output/directory",
sigma_threshold=5.0,
temperature=300.0,
local_catalogs_enabled=True,
local_directory="/path/to/local/catalogs",
local_df="/path/to/local_metadata.csv",
valid_atoms=['C', 'H', 'N', 'O', 'S'],
consider_structure=True,
starting_molecules=['CCO', 'CC(=O)O'], # SMILES strings
manual_add_smiles=False,
force_ignore_molecules=[],
force_include_molecules=[],
stricter = True
)spectrum_path(str): Path to the spectrum .txt file with two columns and no header (frequency in MHz, intensity)directory_path(str): Directory path for output files and required data filessigma_threshold(float): Sigma threshold for peak detection (5 and above recommended)temperature(float): Temperature in Kelvin
local_catalogs_enabled(bool): Whether to use local catalogs. Default:Falselocal_directory(str): Directory containing local .cat files. Default:Nonelocal_df(str): Path to .csv file with local catalog metadata (columns: name, smiles, iso), see below for more detailed instructions. Default:Nonevalid_atoms(list): List of valid atoms for molecules. Default:Nonewhich corresponds to the default atoms H,C,N,O,Sconsider_structure(bool): Whether to consider molecular structure in analysis. Useful if you suspect mixture components should be chemically related (i.e. discharge experiments). Default:Falsestarting_molecules(list): List of starting molecules as SMILES strings to initialize graph calculation. Default:Nonemanual_add_smiles(bool): Enable interactive prompts to manually input SMILES strings for molecules lacking stored SMILES. Default:Falseforce_ignore_molecules(list): List of molecule names to force the algorithm to ignore. Useful if there is a false positive assignment. Name must match the downloaded CDMS and JPL .csv files or local directory of catalogs. Default:[]force_include_molecules(list): List of molecule names to force the algorithm to include in the fit. Useful to test the presence of a molecule. Name must match the downloaded CDMS and JPL .csv files or local directory of catalogs. Default:[]stricter(bool): IfTruehas extra strict molecule filtering during the fitting stage in order to minimize false positive assignments. If too few molecules are being assigned, should set toFalse. DefaultTrue
For local catalogs, must place all .cat files in a single directory. The .cat file name should match the listed name in the local_df. For example if molecule_1.cat is in the local_directory, the local_df .csv file must have an entry in the name column that is molecule_1. These catalogs should also be generated at 300 K to properly interface with molsim.
Your spectrum file must be a .txt file with two columns and no header:
- Column 1: Frequency values (MHz)
- Column 2: Intensity values
Example format:
10000.0 0.025
10000.1 0.031
10000.2 0.028
If providing local (offline) .cat files not in CDMS or JPL:
- Place all
.catfiles in a single directory - Catalogs should be generated at T = 300 K to interface properly with
molsim - Create a
.csvfile with three columns:name: names of the.catfiles (without.catextension)smiles: SMILES strings for each moleculeiso: number of isotopically substituted atoms (e.g., HDCO →1, D₂CO →2)
For examples of the required file structure, see example_local_catalogs. Here, the path to the local_catalogs folder would be inputted as the local_directory input parameter and the path to the local_metadata.csv file would be inputted as the local_df input parameter. In order to consider these catalogs local_catalogs_enabled must be set to True
Precursor molecules can be provided as (these will be used to initialize the graph calculation but are not required):
- A list of SMILES strings in the
starting_moleculesparameter
AMASE generates several output files in the specified directory_path:
dataset_final.csv- Full dataset of all peak frequencies and intensities with molecular candidatesfit_spectrum.html- Interactive plot of all assigned molecules overlaid on observational dataoutput_report.txt- Detailed description of each line assignmentfinal_peak_results.csv- Summary table of all line assignments
If you run into any issues or have questions or suggestions, please contact: zfried@mit.edu
Or open an issue at: https://github.com/zfried/AMASE/issues
TBD
If you use AMASE in your research, please cite: https://pubs.acs.org/doi/10.1021/acs.jpca.4c03580