Skip to content

systemsomicslab/msemblator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MS-Emblator 2025.12.5: A reliable annotation tool for metabolomics data

Overview

Msemblator is a metabolomics annotation tool that integrates results from multiple in-silico annotation tools and applies ensemble learning-based scoring to provide highly reliable annotations.

Users can input MSP files generated by MS-DIAL and perform formula prediction, structure prediction, or both.

Note: This tool is currently designed for use on Windows OS.

Usage examples

1. Formula elucidation only

If you only want to perform formula prediction (no SIRIUS account required):

python msemblator.py --input data\example.msp --output results\formula_only --mode 1

2. Formula + structure elucidation (Recommended)

Run both formula and structure prediction, including ensemble scoring. Requires SIRIUS credentials:

python msemblator.py --input data\example.msp --output results\formula_and_structure --mode 2 --sirius_user your_email@example.com --sirius_pass your_password

3. Structure elucidation only

If your MSP file already contains predicted formulas and you want to perform only structure annotation. Requires SIRIUS credentials:

python msemblator.py --input data\formula_predicted.msp --output results\structure_only --mode 3 --sirius_user your_email@example.com --sirius_pass your_password

Notes

--input: Path to the input MSP file (recommended: exported from MS-DIAL) ・ --output: Output folder to save results ・ --mode: ・ 1 = Formula elucidation only ・ 2 = Both formula and structure elucidation ・ 3 = Structure elucidation only ・ --sirius_user and --sirius_pass : Required only for modes 2 and 3

Input file preparation

Msemblator does not support raw data as input. Instead, MSP files processed with MS-DIAL 5 are strongly recommended. The application utilizes MS-DIAL's MSP output to perform formula and structure predictions.

・ Formula prediction requires at least m/z, MS2 and adduct type. ・ Structure prediction requires the above information plus a predicted molecular formula. ・ If formula information is missing, structure prediction will be skipped. However, this limitation can be resolved by performing both formula and structure predictions simultaneously.

Output files

Msemblator generates two output files:

  1. A file containing the top 3 predictions from each annotation tool for both formula and structure, along with the highest-ranked annotation based on the ensemble scoring model.
  2. A file summarizing the top 3 ranked predictions from the scoring model.

This structured approach ensures that users obtain high-confidence annotations from their metabolomics data.

Parameter tuning

Msemblator allows users to flexibly adjust parameters for each integrated annotation tool (SIRIUS, MS-FINDER, MetFrag, etc.). Most settings should be configured by editing the provided parameter files, ensuring that the workflow can be customized without modifying the code. For advanced features that cannot be configured via parameter files, the corresponding command scripts may be edited directly.

formula_prediction:
  msfinder:
    MS1_ppm: 10
    MS2_ppm: 20
    halogen: True #True or False

  sirius:
  #  possible options: orbitrap, qtof
    MS1: qtof
    MS2_ppm: 20
    halogen: True #True or False

  msbuddy:
    MS1_ppm: 10
    MS2_ppm: 20
    halogen: True #True or False

  msemblator_output_records: 100

structure_prediction:
  msfinder:
    MS1_ppm: 10
    MS2_ppm: 20
    halogen: True #True or False

  sirius:
  #  possible options: orbitrap, qtof
    MS1: qtof
    MS2_ppm: 20

  metfrag:
  # MetFrag uses whichever tolerance is larger: absolute (Da) or relative (ppm)
    MS2_Da: 0.01
    MS2_ppm: 20

  msemblator_output_records: 100

Below is a guideline for where parameters should be set.

MetFrag

MetFrag settings can be modified via the following file:

metfrag\example_parameter.txt

This file controls various options such as scoring weights, number of candidate structures, and the compound database used. Refer to the MetFrag documentationfor a detailed explanation of each parameter.

SIRIUS

Advanced users can manually modify the SIRIUS command-line parameters used for each task by editing the corresponding scripts:
Formula prediction parameterssirius_cmd.py
Structure prediction parameters → sirius_struc_cmd.py

MS-FINDER

MS-FINDER settings can be modified via the following file:
Formula prediction parameters

msfinder\MsfinderConsoleApp_Param_formula.txt

Structure prediction parameters

msfinder\MsfinderConsoleApp-Param_all_processing.txt

msbuddy

msbuddy parameter setting can editing the relevant code in buddy_cmd.

Environment setup

1. Python version:

This tool has been tested with:

Python 3.12+

Check your version:

python --version

2. Install required python modules

Install required Python packages using pip:

cd msemblator/script 
pip install -r requirements.txt

3. Java version

MetFrag requires Java 21 or higher in order to run.

Check your installed Java version:

java --version

4. External Tool Placement

Please download the required external tools and place them in the following recommended structure:

Project structure

msemblator/
├─ structure_scoring_model/    # Structure scoring models
├─ metfrag/                    # MetFrag integration
│   ├─ example_parameter.txt     # Configuration file for MetFrag
│   ├─ library_psv_v2.txt        # Custom database for MetFrag
│   └─ MetFragCommandLine-2.5.0.jar  # MetFrag CLI JAR
├─ formula_scoring_model/      # Formula scoring models
├─ msfinder/                   # MS-FINDER integration
│   ├─ MSFINDER ver 3.61/        # MS-FINDER executable and resources
│   ├─ MsfinderConsoleApp_Param_formula.txt/     # Parameter for MS-FINDER
│   ├─ MsfinderConsoleApp-Param_all_processing.txt/    # Parameter for MS-FINDER
│   └─ coconutandBLEXP.txt/    # MS-FINDER structure database
└─ sirius/                    
    ├─ app/
    ├─ database/               # SIRIUS structure database 
    ├─ ExplorerLicTester/
    ├─ ... (other resources)
    ├─ sirius.exe               # Main SIRIUS executable (CLI)
    └─ sirius-gui.exe           

・MS-DIAL(for MSP generation)
Download:MS-DIAL5
Export your data as .msp using MS-DIAL 5.
These MSP files will be used as input for this tool.

・ SIRIUS (required for formula and structure elucidation)
Download:SIRIUS 5.8.6
You will need:
・ The SIRIUS executable
・ A SIRIUS web service account

・ MS-FINDER (required for formula and structure elucidation)
Download:MSFINDER3.61

・ MetFrag (required for structure elucidation)
Download:MetFragCommandLine-2.5.0.jar

・ Required compound library and scoring model
Download:Required compound library and scoring model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages