- Overview
- Documentation
- System Requirements
- Installation Guide
- Quick Start
- Attribution
- For Developers
MITE (Minimum Information about a Tailoring Enzyme) is a community-driven database for the characterization of tailoring enzymes. These enzymes play crucial roles in the biosynthesis of secondary or specialized metabolites, naturally occurring molecules with strong biological activities, such as antibiotic properties.
This repository contains the single source of truth of the Minimum Information about a Tailoring Enzyme (MITE) database.
For more information, visit the MITE Data Standard Organization page or read our publication.
You can reserve MITE Accession IDs for your to-be-published manuscript. Please read more about it in this discussion.
This repository contains the single source of truth of the Minimum Information about a Tailoring Enzyme (MITE) database.
This data is in the form of JSON files controlled by mite_schema and validated by mite_extras. These files are created by user submissions via the MITE web portal, expert-reviewed via pull requests, and then deposited in the Zenodo repository. From there, the MITE web portal and other tools such as antiSMASH pull the data for their own use.
This repository also provides some CLI functionality to generate auxiliary files:
- Metadata files summarizing information of MITE entries (
mite_data/metadata) - Protein FASTA-files for all active (i.e. non-retired) MITE entries (
mite_data/fasta)
For feature requests and suggestions, please refer to the MITE Discussion forum.
For simple data submissions, please refer to the MITE web portal. For more complex or large-scale submission, please get in touch with us by e.g. opening an Issue.
Local installation was tested on:
- Ubuntu Linux 20.04 and 22.04 (command line)
Dependencies including exact versions are specified in the pyproject.toml file.
Note: assumes that uv is installed locally - see the methods described here
git clone https://github.com/mite-standard/mite_data
uv sync
uv run pre-commit install
This CLI serves two purposes:
- update missing auxiliary files
- validate files
Normally, the CLI automatically start in single-file mode, triggered by pre-commit.
Therefore, whenever a file is committed, pre-commit will download missing files, update the metadata, and perform checks.
This is equivalent to:
uv run python mite_data/main.py <your-mite-file>.json
uv run python mite_data/validation/mite_validation.py <your-mite-file>.json
In some exceptional cases, you may want to trigger a full regeneration of all files.
Nota bene: This will overwrite all manual taxonomy annotations in the metadata_general.json file
uv run python ./mite_data/main.py
uv run python mite_data/validation/mite_validation.py
All code and data in mite_data is released to the public domain under the CC0 license (see LICENSE).
See CITATION.cff or MITE online for information on citing MITE.
This work was supported by the Netherlands Organization for Scientific Research (NWO) KIC grant KICH1.LWV04.21.013.
Nota bene: for details on how to contribute to the MITE project, please refer to CONTRIBUTING.
For installation instruction, see above
Note: assumes that uv is installed locally - see the methods described here
All tests should be passing
uv run pytest
Nota bene: All described procedures require pre-commit to be installed and initiated.
CI/CD via GitHub Actions runs on every PR and push to the main branch.
A new release created on the mite_data GitHub page will automatically relay changes to Zenodo.
- Merge reviewed pending pull requests (PRs) into main.
- Fetch changes with
git fetch. - Checkout remote branch with
git checkout -b local-<branch-uuid> origin/<branch-uuid>. - Replace content of file
mite_data/data/<uuid>.jsonwith reviewed content from PR on GitHub. - Replace
status:pendingwithstatus:activeand coin a new MITE accession number. Check for any reserved accessions. - Prepare a commit by running
git add . && git commit -m "reviewed entry" - Push to remote with
git push origin HEAD:<branch-uuid> - On GitHub, merge the respective PR into main and delete the feature branch.
- Locally, checkout the main branch, pull in changes, and remove the local feature branch with
git checkout main && git pull && git branch -d local-<branch-uuid> - Repeat for all open PRs on GitHub
- Fetch changes with
- Create a release branch and update auxilliary files
- Fetch changes with
git fetch. - Create a local branch and push to remote with
git checkout -b <release> - Update version in
pyproject.tomlandCHANGELOG.md - Sync the package version with
uv sync - Push to remote using
git push --set-upstream origin <release>
- Fetch changes with
- Create PR on GitHub
- Request a review (if applicable)
- Merge into main
- When all tests pass: create a new release (syncs data to Zenodo)
mite_data employs automated checks using both pre-commit and CI/CD using GitHub Actions.
Nota bene: pre-commit applies checks only to new/modified files.
On PR to main
Nota bene: Applies checks only to new/modified files.
Summary of checks
Runs .github/mite_validation.py/run_file():
- File exists
- Filename matches convention
- File is release-ready (correct status, accession not one of reserved)
- No duplicates (based on shared GenPept and UniProt IDs)
- Validation checks of
mite_extraspass - Check if all database Ids are correct (can be accessed/downloaded)
- Check if UniProt and GenPept match each other (using
mite_extras) - If MIBiG ID was specified, check if GenPept ID matches with MIBiG's protein list
- Check if MITE entry can be annotated with Rhea ID (based on UniProt ID)
On push to main
Nota bene: Applies checks to all files (i.e. when a branch is merged into main).
Summary of checks
Runs .github/mite_validation.py/run_data_dir():
- File exists
- Filename matches convention
- File has an accompanying fasta file
- Retired files have no accompanying fasta files
- File is release-ready (correct status, accession not one of reserved)
- Accessions in headers of fasta files match their corresponding IDs in MITE files
- No duplicates (based on shared GenPept and UniProt IDs)
- Validation checks of
mite_extraspass - Check if all database Ids are correct (can be accessed/downloaded)
Additional checks:
- Package can be installed
- All tests passing