Skip to content

mite-standard/mite_data

mite_extras

DOI

Contents

Overview

MITE (Minimum Information about a Tailoring Enzyme) is a community-driven database for the characterization of tailoring enzymes. These enzymes play crucial roles in the biosynthesis of secondary or specialized metabolites, naturally occurring molecules with strong biological activities, such as antibiotic properties.

This repository contains the single source of truth of the Minimum Information about a Tailoring Enzyme (MITE) database.

For more information, visit the MITE Data Standard Organization page or read our publication.

MITE Accession Reservation

You can reserve MITE Accession IDs for your to-be-published manuscript. Please read more about it in this discussion.

Documentation

This repository contains the single source of truth of the Minimum Information about a Tailoring Enzyme (MITE) database.

This data is in the form of JSON files controlled by mite_schema and validated by mite_extras. These files are created by user submissions via the MITE web portal, expert-reviewed via pull requests, and then deposited in the Zenodo repository. From there, the MITE web portal and other tools such as antiSMASH pull the data for their own use.

This repository also provides some CLI functionality to generate auxiliary files:

  • Metadata files summarizing information of MITE entries (mite_data/metadata)
  • Protein FASTA-files for all active (i.e. non-retired) MITE entries (mite_data/fasta)

For feature requests and suggestions, please refer to the MITE Discussion forum.

For simple data submissions, please refer to the MITE web portal. For more complex or large-scale submission, please get in touch with us by e.g. opening an Issue.

System Requirements

OS Requirements

Local installation was tested on:

  • Ubuntu Linux 20.04 and 22.04 (command line)

Python dependencies

Dependencies including exact versions are specified in the pyproject.toml file.

Installation Guide

With uv from GitHub

Note: assumes that uv is installed locally - see the methods described here

git clone https://github.com/mite-standard/mite_data
uv sync
uv run pre-commit install

Quick Start

This CLI serves two purposes:

  • update missing auxiliary files
  • validate files

Normally, the CLI automatically start in single-file mode, triggered by pre-commit. Therefore, whenever a file is committed, pre-commit will download missing files, update the metadata, and perform checks.

This is equivalent to:

uv run python mite_data/main.py <your-mite-file>.json
uv run python mite_data/validation/mite_validation.py <your-mite-file>.json

In some exceptional cases, you may want to trigger a full regeneration of all files.

Nota bene: This will overwrite all manual taxonomy annotations in the metadata_general.json file

uv run python ./mite_data/main.py
uv run python mite_data/validation/mite_validation.py 

Attribution

License

All code and data in mite_data is released to the public domain under the CC0 license (see LICENSE).

Publications

See CITATION.cff or MITE online for information on citing MITE.

Acknowledgements

This work was supported by the Netherlands Organization for Scientific Research (NWO) KIC grant KICH1.LWV04.21.013.

For Developers

Nota bene: for details on how to contribute to the MITE project, please refer to CONTRIBUTING.

For installation instruction, see above

Installation with uv from GitHub

Note: assumes that uv is installed locally - see the methods described here

All tests should be passing

uv run pytest

Updating and CI/CD

Nota bene: All described procedures require pre-commit to be installed and initiated.

CI/CD via GitHub Actions runs on every PR and push to the main branch.

A new release created on the mite_data GitHub page will automatically relay changes to Zenodo.

Update procedure

  1. Merge reviewed pending pull requests (PRs) into main.
    • Fetch changes with git fetch.
    • Checkout remote branch with git checkout -b local-<branch-uuid> origin/<branch-uuid>.
    • Replace content of file mite_data/data/<uuid>.json with reviewed content from PR on GitHub.
    • Replace status:pending with status:active and coin a new MITE accession number. Check for any reserved accessions.
    • Prepare a commit by running git add . && git commit -m "reviewed entry"
    • Push to remote with git push origin HEAD:<branch-uuid>
    • On GitHub, merge the respective PR into main and delete the feature branch.
    • Locally, checkout the main branch, pull in changes, and remove the local feature branch with git checkout main && git pull && git branch -d local-<branch-uuid>
    • Repeat for all open PRs on GitHub
  2. Create a release branch and update auxilliary files
    • Fetch changes with git fetch.
    • Create a local branch and push to remote with git checkout -b <release>
    • Update version in pyproject.toml and CHANGELOG.md
    • Sync the package version with uv sync
    • Push to remote using git push --set-upstream origin <release>
  3. Create PR on GitHub
    • Request a review (if applicable)
    • Merge into main
    • When all tests pass: create a new release (syncs data to Zenodo)

CI/CD

mite_data employs automated checks using both pre-commit and CI/CD using GitHub Actions.

pre-commit

Nota bene: pre-commit applies checks only to new/modified files.

GitHub CI/CD

On PR to main

Nota bene: Applies checks only to new/modified files.

Summary of checks

Runs .github/mite_validation.py/run_file():

  • File exists
  • Filename matches convention
  • File is release-ready (correct status, accession not one of reserved)
  • No duplicates (based on shared GenPept and UniProt IDs)
  • Validation checks of mite_extras pass
  • Check if all database Ids are correct (can be accessed/downloaded)
  • Check if UniProt and GenPept match each other (using mite_extras)
  • If MIBiG ID was specified, check if GenPept ID matches with MIBiG's protein list
  • Check if MITE entry can be annotated with Rhea ID (based on UniProt ID)

On push to main

Nota bene: Applies checks to all files (i.e. when a branch is merged into main).

Summary of checks

Runs .github/mite_validation.py/run_data_dir():

  • File exists
  • Filename matches convention
  • File has an accompanying fasta file
  • Retired files have no accompanying fasta files
  • File is release-ready (correct status, accession not one of reserved)
  • Accessions in headers of fasta files match their corresponding IDs in MITE files
  • No duplicates (based on shared GenPept and UniProt IDs)
  • Validation checks of mite_extras pass
  • Check if all database Ids are correct (can be accessed/downloaded)

Additional checks:

  • Package can be installed
  • All tests passing

About

Repository containing entries following the MITE data standard

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages