Automated Molecular Assignment and Source Parameter Estimation in Radio Astronomical Observations
- βοΈ Parameter Guide
- π Example Notebook
- π¦ Required Database Files (Dropbox)
Astro AMASE is a comprehensive Python package for automated molecular line identification...
β¨ Automated Analysis Pipeline
- Spectral line peak detection with adaptive sigma thresholding
- Automatic VLSR and temperature determination
- Linewidth calculation via Gaussian fitting
π¬ Molecular Line Assignment
- Query CDMS and JPL molecular databases
- Iterative assignment with structural relevance scoring (VICGAE embeddings)
- Context-aware rescoring as detected molecules accumulate
- Handles blended lines and multiple carriers
π Best-Fit Modeling
- Column density optimization via least-squares fitting
- Interactive Bokeh visualizations
- Quality control and molecule filtering
- Comprehensive output reports
Python 3.11 or 3.12 is required. Dependencies such as pandas 3.x require Python β₯ 3.11.
If you're using conda, create an environment with Python 3.11:
conda create -n astro_amase_env python=3.11
conda activate astro_amase_envpip install astro_amasegit clone https://github.com/zfried/astro_amase.git
cd astro_amase
pip install -e .π Parameter Guide (PARAMETERS.md) - Comprehensive guide to all available parameters, including:
- Required and optional parameters
- Parameter recommendations for different use cases
- Common use case examples
- Tips and best practices
π Example Notebook - Complete workflows and usage examples
The package requires several files to be downloaded from the following Dropbox folder. This folder was last updated March 11, 2026, so if you had downloaded these files prior to this date, I would recommend re-downloading. These files are relatively large and include local copies of the CDMS and JPL molecular databases, as well as molsim Molecule objects for the catalogs. All files should be saved in the same local directory where your output files will be written. The path to this directory should then be provided as the directory_path argument in the relevant functions.
For comprehensive usage examples and workflows, see notebooks/example_notebook.ipynb.
The example notebook demonstrates:
- Complete end-to-end analysis workflows
- Parameter selection
- Visualization techniques
- Post-processing and interpretation of results
For detailed parameter documentation, see PARAMETERS.md.
import astro_amase
results = astro_amase.assign_observations(
spectrum_path='spectrum.txt',
directory_path='./directory/',
temperature=150.0,
sigma_threshold=5.0,
observation_type='interferometric',
beam_major_axis=0.5,
beam_minor_axis=0.5,
source_size=1E20,
continuum_temperature=2.7,
valid_atoms=['C', 'O', 'H', 'N', 'S']
)Plain text file with two columns (space or tab separated) and no header. Frequency must be in increasing order:
- Column 1: Frequency (MHz)
- Column 2: Intensity (Kelvin)
The code was designed for data with intensity units of K. For accurate determination of column density and temperature, the intensity units must indeed be in K. However, even if the data are in Jy/beam, the line assignments should still be reasonably reliable.
Example:
345000.0 0.05
345000.1 0.06
345000.2 0.08
...
Running the analysis produces several output files:
-
fit_spectrum.html: Interactive Bokeh plot showing:- Observed spectrum (black)
- Total fitted spectrum (red)
- Individual molecular contributions (colored, toggleable)
-
final_peak_results.csv: Peak-by-peak assignmentspeak_freq,experimental_intensity_max,total_simulated_intensity,difference,carrier_molecules 345123.456,10.5,9.8,0.7,"['CH3OH', 'H2CO']"
-
output_report.txt: Detailed text report with:- Assignment status for each line
- Candidate molecules and scores
- Quality issues and penalties
- Summary statistics
-
column_density_results.csv: Best-fit column densitiesmolecule,column_density,smiles CH3OH,1.5e15,CO H2CO,8.2e14,C=O
-
analysis_parameters.json: Report of parameters used in code:- Stores value of the determined vlsr, temperature, linewidth, etc.
- Stores some assignment summary statistics
- Required for some subsequent plotting functionality
-
Data Loading & Peak Detection
- Load spectrum and detect peaks above Ο threshold
- Calculate RMS noise
-
Linewidth Determination
- Gaussian fitting to strongest peaks
- Median FWHM calculation
- Conversion to velocity width
-
VLSR & Temperature Estimation (if unknown)
- Database query for candidate transitions
- VLSR clustering analysis
- Least-squares optimization
-
Dataset Creation
- Query CDMS/JPL/LSD for candidates within ΞΞ½
- Simulate spectra at observational parameters
- Filter duplicates and apply quality control
-
Iterative Line Assignment
- Static checks (invalid atoms, vibrational states, intensity checks)
- Dynamic scoring (structural relevance via VICGAE)
- Softmax and combined score calculation
- Reassignment when new molecules detected
-
Best-Fit Modeling
- Build lookup tables for rapid simulation
- Optimize column densities via least-squares
- Quality filtering (remove weak contributors)
- Generate visualizations
Astro AMASE provides several plotting utilities for visualizing and analyzing results:
Show the interactive Bokeh plot directly in a Jupyter notebook:
import astro_amase
# Run analysis
results = astro_amase.assign_observations(...)
# Display interactive plot in notebook
astro_amase.show_fit_in_notebook(results)
# Or display only specific molecules
astro_amase.show_fit_in_notebook(results, mols_to_display=['CH3OH, vt = 0 - 2', 'H2CO'])Generate interactive plots from previously saved analysis results:
astro_amase.plot_from_saved(
spectrum_path='spectrum.txt',
directory_path='./directory/',
column_density_csv='./directory/column_density_results.csv',
stored_json='./directory/output_parameters.json'
)
# Filter to specific molecules
astro_amase.plot_from_saved(
spectrum_path='spectrum.txt',
directory_path='./directory/',
column_density_csv='./directory/column_density_results.csv',
stored_json='./directory/output_parameters.json',
mols_to_display=['CH3OH, vt = 0 - 2', 'HC3N, (0,0,0,0)', 'H2CO']
)Create detailed PDF files showing individual spectral peaks with quantum number assignments:
astro_amase.get_individual_plots(
spectrum_path='spectrum.txt',
directory_path='./directory/',
column_density_csv='./directory/column_density_results.csv',
stored_json='./directory/output_parameters.json',
minimum_intensity='default' # or specify a custom threshold
)This generates {molecule_name}_peaks.pdf files containing:
- 3-column grid of individual peak subplots
- Observed spectrum (black) and simulated spectrum (red) for each peak
- Quantum number assignments from catalog
- Peaks sorted by intensity
results = astro_amase.assign_observations(...)
# Get the assigner object
assigner = results['assigner']
# Individual line details
for line in assigner.lines[:10]: # First 10 lines
if line.assignment_status:
print(f"{line.frequency:.4f} MHz: {line.assignment_status.value}")
if line.assigned_molecule:
print(f" β {line.assigned_molecule}")
print(f" Score: {line.best_candidate.global_score:.2f}")import pandas as pd
# Load peak results
peaks = pd.read_csv('directory/final_peak_results.csv')
# Find strongest assigned lines
assigned = peaks[peaks['carrier_molecules'] != "['Unidentified']"]
strongest = assigned.nlargest(10, 'experimental_intensity_max')
# Load column densities
columns = pd.read_csv('directory/column_density_results.csv')
print(columns.sort_values('column_density', ascending=False))Paper is in prep!
- Python β₯ 3.11
- pandas == 3.0.1
- numpy == 2.2.6
- torch == 2.10.0
- rdkit == 2025.9.6
- scipy == 1.13.1
- bokeh == 3.8.2
- numba == 0.64.0
- astropy == 7.2.0
- matplotlib == 3.10.8
- pyyaml == 6.0.3
- astrochem_embedding == 0.2.0
- group-selfies @ git+https://github.com/aspuru-guzik-group/group-selfies.git
TBD
For questions, issues, or feedback:
- π§ Email: zfried@mit.edu
- π Issues: GitHub Issues
- CDMS (Cologne Database for Molecular Spectroscopy)
- JPL Molecular Spectroscopy Database
- LSD (Lille Spectroscopic Database)
- astrochem_embedding for VICGAE structural relevance scoring
- molsim for spectral simulation tools
Astro AMASE - Making molecular line identification in radio astronomy automated.