Skip to content

zfried/astro_amase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

110 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Astro AMASE

Automated Molecular Assignment and Source Parameter Estimation in Radio Astronomical Observations

⚠️ Update (Feb 13, 2026): A notable efficiency fix was released today. Please reinstall if you have an existing installation.

Quick Links

Astro AMASE is a comprehensive Python package for automated molecular line identification...

Features

✨ Automated Analysis Pipeline

  • Spectral line peak detection with adaptive sigma thresholding
  • Automatic VLSR and temperature determination
  • Linewidth calculation via Gaussian fitting

πŸ”¬ Molecular Line Assignment

  • Query CDMS and JPL molecular databases
  • Iterative assignment with structural relevance scoring (VICGAE embeddings)
  • Context-aware rescoring as detected molecules accumulate
  • Handles blended lines and multiple carriers

πŸ“Š Best-Fit Modeling

  • Column density optimization via least-squares fitting
  • Interactive Bokeh visualizations
  • Quality control and molecule filtering
  • Comprehensive output reports

Installation

Python Version Requirement

Python 3.11 or 3.12 is required. Dependencies such as pandas 3.x require Python β‰₯ 3.11.

If you're using conda, create an environment with Python 3.11:

conda create -n astro_amase_env python=3.11
conda activate astro_amase_env

From PyPI (once published, which hasn't happened yet)

pip install astro_amase

From Source (required for now)

git clone https://github.com/zfried/astro_amase.git
cd astro_amase
pip install -e .

Documentation

πŸ“š Parameter Guide (PARAMETERS.md) - Comprehensive guide to all available parameters, including:

  • Required and optional parameters
  • Parameter recommendations for different use cases
  • Common use case examples
  • Tips and best practices

πŸ““ Example Notebook - Complete workflows and usage examples

Quick Start

Required Database Files

The package requires several files to be downloaded from the following Dropbox folder. This folder was last updated March 11, 2026, so if you had downloaded these files prior to this date, I would recommend re-downloading. These files are relatively large and include local copies of the CDMS and JPL molecular databases, as well as molsim Molecule objects for the catalogs. All files should be saved in the same local directory where your output files will be written. The path to this directory should then be provided as the directory_path argument in the relevant functions.

πŸ““ Extensive Usage Examples

For comprehensive usage examples and workflows, see notebooks/example_notebook.ipynb.

The example notebook demonstrates:

  • Complete end-to-end analysis workflows
  • Parameter selection
  • Visualization techniques
  • Post-processing and interpretation of results

For detailed parameter documentation, see PARAMETERS.md.

Basic Usage

import astro_amase

results = astro_amase.assign_observations(
    spectrum_path='spectrum.txt',
    directory_path='./directory/',
    temperature=150.0,
    sigma_threshold=5.0,
    observation_type='interferometric',
    beam_major_axis=0.5,
    beam_minor_axis=0.5,
    source_size=1E20,
    continuum_temperature=2.7,
    valid_atoms=['C', 'O', 'H', 'N', 'S']
)

Input Requirements

Spectrum File Format

Plain text file with two columns (space or tab separated) and no header. Frequency must be in increasing order:

  • Column 1: Frequency (MHz)
  • Column 2: Intensity (Kelvin)

The code was designed for data with intensity units of K. For accurate determination of column density and temperature, the intensity units must indeed be in K. However, even if the data are in Jy/beam, the line assignments should still be reasonably reliable.

Example:

345000.0        0.05
345000.1        0.06
345000.2        0.08
...

Output Files

Running the analysis produces several output files:

  • fit_spectrum.html: Interactive Bokeh plot showing:

    • Observed spectrum (black)
    • Total fitted spectrum (red)
    • Individual molecular contributions (colored, toggleable)
  • final_peak_results.csv: Peak-by-peak assignments

    peak_freq,experimental_intensity_max,total_simulated_intensity,difference,carrier_molecules
    345123.456,10.5,9.8,0.7,"['CH3OH', 'H2CO']"
  • output_report.txt: Detailed text report with:

    • Assignment status for each line
    • Candidate molecules and scores
    • Quality issues and penalties
    • Summary statistics
  • column_density_results.csv: Best-fit column densities

    molecule,column_density,smiles
    CH3OH,1.5e15,CO
    H2CO,8.2e14,C=O
  • analysis_parameters.json: Report of parameters used in code:

    • Stores value of the determined vlsr, temperature, linewidth, etc.
    • Stores some assignment summary statistics
    • Required for some subsequent plotting functionality

Algorithm Overview

  1. Data Loading & Peak Detection

    • Load spectrum and detect peaks above Οƒ threshold
    • Calculate RMS noise
  2. Linewidth Determination

    • Gaussian fitting to strongest peaks
    • Median FWHM calculation
    • Conversion to velocity width
  3. VLSR & Temperature Estimation (if unknown)

    • Database query for candidate transitions
    • VLSR clustering analysis
    • Least-squares optimization
  4. Dataset Creation

    • Query CDMS/JPL/LSD for candidates within Δν
    • Simulate spectra at observational parameters
    • Filter duplicates and apply quality control
  5. Iterative Line Assignment

    • Static checks (invalid atoms, vibrational states, intensity checks)
    • Dynamic scoring (structural relevance via VICGAE)
    • Softmax and combined score calculation
    • Reassignment when new molecules detected
  6. Best-Fit Modeling

    • Build lookup tables for rapid simulation
    • Optimize column densities via least-squares
    • Quality filtering (remove weak contributors)
    • Generate visualizations

Advanced Usage

Interactive Plotting Functions

Astro AMASE provides several plotting utilities for visualizing and analyzing results:

Display Results in Notebook

Show the interactive Bokeh plot directly in a Jupyter notebook:

import astro_amase

# Run analysis
results = astro_amase.assign_observations(...)

# Display interactive plot in notebook
astro_amase.show_fit_in_notebook(results)

# Or display only specific molecules
astro_amase.show_fit_in_notebook(results, mols_to_display=['CH3OH, vt = 0 - 2', 'H2CO'])

Recreate Plots from Saved Data

Generate interactive plots from previously saved analysis results:

astro_amase.plot_from_saved(
    spectrum_path='spectrum.txt',
    directory_path='./directory/',
    column_density_csv='./directory/column_density_results.csv',
    stored_json='./directory/output_parameters.json'
)

# Filter to specific molecules
astro_amase.plot_from_saved(
    spectrum_path='spectrum.txt',
    directory_path='./directory/',
    column_density_csv='./directory/column_density_results.csv',
    stored_json='./directory/output_parameters.json',
    mols_to_display=['CH3OH, vt = 0 - 2', 'HC3N, (0,0,0,0)', 'H2CO']
)

Generate Individual Peak Plots

Create detailed PDF files showing individual spectral peaks with quantum number assignments:

astro_amase.get_individual_plots(
    spectrum_path='spectrum.txt',
    directory_path='./directory/',
    column_density_csv='./directory/column_density_results.csv',
    stored_json='./directory/output_parameters.json',
    minimum_intensity='default'  # or specify a custom threshold
)

This generates {molecule_name}_peaks.pdf files containing:

  • 3-column grid of individual peak subplots
  • Observed spectrum (black) and simulated spectrum (red) for each peak
  • Quantum number assignments from catalog
  • Peaks sorted by intensity

Accessing Detailed Results

results = astro_amase.assign_observations(...)

# Get the assigner object
assigner = results['assigner']

# Individual line details
for line in assigner.lines[:10]:  # First 10 lines
    if line.assignment_status:
        print(f"{line.frequency:.4f} MHz: {line.assignment_status.value}")
        if line.assigned_molecule:
            print(f"  β†’ {line.assigned_molecule}")
            print(f"  Score: {line.best_candidate.global_score:.2f}")

Programmatic Analysis

import pandas as pd

# Load peak results
peaks = pd.read_csv('directory/final_peak_results.csv')

# Find strongest assigned lines
assigned = peaks[peaks['carrier_molecules'] != "['Unidentified']"]
strongest = assigned.nlargest(10, 'experimental_intensity_max')

# Load column densities
columns = pd.read_csv('directory/column_density_results.csv')
print(columns.sort_values('column_density', ascending=False))

Citation

Paper is in prep!

Requirements

  • Python β‰₯ 3.11
  • pandas == 3.0.1
  • numpy == 2.2.6
  • torch == 2.10.0
  • rdkit == 2025.9.6
  • scipy == 1.13.1
  • bokeh == 3.8.2
  • numba == 0.64.0
  • astropy == 7.2.0
  • matplotlib == 3.10.8
  • pyyaml == 6.0.3
  • astrochem_embedding == 0.2.0
  • group-selfies @ git+https://github.com/aspuru-guzik-group/group-selfies.git

License

TBD

Support

For questions, issues, or feedback:

Acknowledgments

  • CDMS (Cologne Database for Molecular Spectroscopy)
  • JPL Molecular Spectroscopy Database
  • LSD (Lille Spectroscopic Database)
  • astrochem_embedding for VICGAE structural relevance scoring
  • molsim for spectral simulation tools

Astro AMASE - Making molecular line identification in radio astronomy automated.

About

Automated Molecular Assignment and Source Parameter Estimation for Radio Astronomy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors