Msemblator is a metabolomics annotation tool that integrates results from multiple in-silico annotation tools and applies ensemble learning-based scoring to provide highly reliable annotations.
Users can input MSP files generated by MS-DIAL and perform formula prediction, structure prediction, or both.
Note: This tool is currently designed for use on Windows OS.
If you only want to perform formula prediction (no SIRIUS account required):
python msemblator.py --input data\example.msp --output results\formula_only --mode 1Run both formula and structure prediction, including ensemble scoring. Requires SIRIUS credentials:
python msemblator.py --input data\example.msp --output results\formula_and_structure --mode 2 --sirius_user your_email@example.com --sirius_pass your_passwordIf your MSP file already contains predicted formulas and you want to perform only structure annotation. Requires SIRIUS credentials:
python msemblator.py --input data\formula_predicted.msp --output results\structure_only --mode 3 --sirius_user your_email@example.com --sirius_pass your_password・ --input: Path to the input MSP file (recommended: exported from MS-DIAL)
・ --output: Output folder to save results
・ --mode:
・ 1 = Formula elucidation only
・ 2 = Both formula and structure elucidation
・ 3 = Structure elucidation only
・ --sirius_user and --sirius_pass : Required only for modes 2 and 3
Msemblator does not support raw data as input. Instead, MSP files processed with MS-DIAL 5 are strongly recommended. The application utilizes MS-DIAL's MSP output to perform formula and structure predictions.
・ Formula prediction requires at least m/z, MS2 and adduct type. ・ Structure prediction requires the above information plus a predicted molecular formula. ・ If formula information is missing, structure prediction will be skipped. However, this limitation can be resolved by performing both formula and structure predictions simultaneously.
Msemblator generates two output files:
- A file containing the top 3 predictions from each annotation tool for both formula and structure, along with the highest-ranked annotation based on the ensemble scoring model.
- A file summarizing the top 3 ranked predictions from the scoring model.
This structured approach ensures that users obtain high-confidence annotations from their metabolomics data.
Msemblator allows users to flexibly adjust parameters for each integrated annotation tool (SIRIUS, MS-FINDER, MetFrag, etc.). Most settings should be configured by editing the provided parameter files, ensuring that the workflow can be customized without modifying the code. For advanced features that cannot be configured via parameter files, the corresponding command scripts may be edited directly.
formula_prediction:
msfinder:
MS1_ppm: 10
MS2_ppm: 20
halogen: True #True or False
sirius:
# possible options: orbitrap, qtof
MS1: qtof
MS2_ppm: 20
halogen: True #True or False
msbuddy:
MS1_ppm: 10
MS2_ppm: 20
halogen: True #True or False
msemblator_output_records: 100
structure_prediction:
msfinder:
MS1_ppm: 10
MS2_ppm: 20
halogen: True #True or False
sirius:
# possible options: orbitrap, qtof
MS1: qtof
MS2_ppm: 20
metfrag:
# MetFrag uses whichever tolerance is larger: absolute (Da) or relative (ppm)
MS2_Da: 0.01
MS2_ppm: 20
msemblator_output_records: 100Below is a guideline for where parameters should be set.
MetFrag settings can be modified via the following file:
metfrag\example_parameter.txt
This file controls various options such as scoring weights, number of candidate structures, and the compound database used. Refer to the MetFrag documentationfor a detailed explanation of each parameter.
Advanced users can manually modify the SIRIUS command-line parameters used for each task by editing the corresponding scripts:
・Formula prediction parameters → sirius_cmd.py
・Structure prediction parameters → sirius_struc_cmd.py
MS-FINDER settings can be modified via the following file:
・Formula prediction parameters
msfinder\MsfinderConsoleApp_Param_formula.txt
・Structure prediction parameters
msfinder\MsfinderConsoleApp-Param_all_processing.txt
msbuddy parameter setting can editing the relevant code in buddy_cmd.
This tool has been tested with:
Python 3.12+
Check your version:
python --versionInstall required Python packages using pip:
cd msemblator/script
pip install -r requirements.txtMetFrag requires Java 21 or higher in order to run.
Check your installed Java version:
java --versionPlease download the required external tools and place them in the following recommended structure:
msemblator/
├─ structure_scoring_model/ # Structure scoring models
├─ metfrag/ # MetFrag integration
│ ├─ example_parameter.txt # Configuration file for MetFrag
│ ├─ library_psv_v2.txt # Custom database for MetFrag
│ └─ MetFragCommandLine-2.5.0.jar # MetFrag CLI JAR
├─ formula_scoring_model/ # Formula scoring models
├─ msfinder/ # MS-FINDER integration
│ ├─ MSFINDER ver 3.61/ # MS-FINDER executable and resources
│ ├─ MsfinderConsoleApp_Param_formula.txt/ # Parameter for MS-FINDER
│ ├─ MsfinderConsoleApp-Param_all_processing.txt/ # Parameter for MS-FINDER
│ └─ coconutandBLEXP.txt/ # MS-FINDER structure database
└─ sirius/
├─ app/
├─ database/ # SIRIUS structure database
├─ ExplorerLicTester/
├─ ... (other resources)
├─ sirius.exe # Main SIRIUS executable (CLI)
└─ sirius-gui.exe
・MS-DIAL(for MSP generation)
Download:MS-DIAL5
Export your data as .msp using MS-DIAL 5.
These MSP files will be used as input for this tool.
・ SIRIUS (required for formula and structure elucidation)
Download:SIRIUS 5.8.6
You will need:
・ The SIRIUS executable
・ A SIRIUS web service account
・ MS-FINDER (required for formula and structure elucidation)
Download:MSFINDER3.61
・ MetFrag (required for structure elucidation)
Download:MetFragCommandLine-2.5.0.jar
・ Required compound library and scoring model
Download:Required compound library and scoring model