-
Notifications
You must be signed in to change notification settings - Fork 1
About
intronIC was created by Graham Larue during his PhD at UC Merced to provide a customizable, open-source method for identifying minor (U12-type) spliceosomal introns from annotated intron sequences.
Minor introns usually represent ~0.5% (at most) of a given genome's introns, and contain distinct splicing motifs which make them amenable to bioinformatic identification. Despite their rarity, U12-type introns are:
- Functionally important: Genes containing U12-type introns are often involved in essential cellular processes
- Evolutionarily conserved: Present in most eukaryotic lineages, with documented losses in nematodes, some fungi, and certain protists
Earlier minor intron resources (U12DB, SpliceRack, ERISdb, etc.), while important contributions to the field, are static by design. These databases:
- Are not updated with new genome releases
- Are based on older genome annotations
- Are limited to pre-selected species
- Use heuristic classification criteria
intronIC addresses these limitations by:
- Working with any genome + annotation
- Using well-established SVM classification with a pretrained model
- Producing interpretable probability scores
- Supporting custom model training for specialized use cases
- Providing extensive metadata for downstream analysis
- Being regularly updated with algorithm improvements
intronIC relegates the "how U12-like" question to a linear support vector machine (SVM), which:
- Learns the decision boundary from curated reference data
- Produces probability scores (0-100%) via Platt scaling
- Uses balanced class weights to handle extreme class imbalance (~0.5% U12)
- Provides interpretable z-scores for the three key motifs
This avoids arbitrary score thresholds and provides probabilistic classifications that researchers can interpret based on their specific needs (e.g., high-confidence predictions for experimental validation vs. comprehensive catalogs).
intronIC performs extensive bookkeeping during intron collection, resulting in useful metadata including:
- Parent gene/transcript identifiers
- Ordinal intron index within transcript
- Intron phase (position relative to codon boundary)
- Fractional position within transcript
- Terminal dinucleotides and motif sequences
This information, which is otherwise non-trivial to acquire, enables downstream analyses of U12-type intron distribution, positional biases, and evolutionary patterns.
If you use intronIC in your research, please cite:
Moyer DC, Larue GE, Hershberger CE, Roy SW, Padgett RA. (2020) Comprehensive database and evolutionary dynamics of U12-type introns. Nucleic Acids Research 48(13):7066–7078. doi:10.1093/nar/gkaa464
intronIC is released under the GNU General Public License v3.0.
-
Where the Minor Things Are — More recent intron database based on
intronIC(published in Larue & Roy 2023) -
MIDB — Minor Intron Database (published in original
intronICpaper) - U12DB — One of the first U12-type intron databases (Alioto 2007)
- SpliceRack — Another early splice site/minor intron database (Sheth et al. 2006); appears to no longer be maintained/available.