Skip to content
Graham Larue edited this page Dec 9, 2025 · 9 revisions

About

intronIC was created by Graham Larue during his PhD at UC Merced to provide a customizable, open-source method for identifying minor (U12-type) spliceosomal introns from annotated intron sequences.

Background

Minor introns usually represent ~0.5% (at most) of a given genome's introns, and contain distinct splicing motifs which make them amenable to bioinformatic identification. Despite their rarity, U12-type introns are:

  • Functionally important: Genes containing U12-type introns are often involved in essential cellular processes
  • Evolutionarily conserved: Present in most eukaryotic lineages, with documented losses in nematodes, some fungi, and certain protists

Why intronIC?

Earlier minor intron resources (U12DB, SpliceRack, ERISdb, etc.), while important contributions to the field, are static by design. These databases:

  • Are not updated with new genome releases
  • Are based on older genome annotations
  • Are limited to pre-selected species
  • Use heuristic classification criteria

intronIC addresses these limitations by:

  • Working with any genome + annotation
  • Using well-established SVM classification with a pretrained model
  • Producing interpretable probability scores
  • Supporting custom model training for specialized use cases
  • Providing extensive metadata for downstream analysis
  • Being regularly updated with algorithm improvements

Classification approach

intronIC relegates the "how U12-like" question to a linear support vector machine (SVM), which:

  • Learns the decision boundary from curated reference data
  • Produces probability scores (0-100%) via Platt scaling
  • Uses balanced class weights to handle extreme class imbalance (~0.5% U12)
  • Provides interpretable z-scores for the three key motifs

This avoids arbitrary score thresholds and provides probabilistic classifications that researchers can interpret based on their specific needs (e.g., high-confidence predictions for experimental validation vs. comprehensive catalogs).

Metadata and bookkeeping

intronIC performs extensive bookkeeping during intron collection, resulting in useful metadata including:

  • Parent gene/transcript identifiers
  • Ordinal intron index within transcript
  • Intron phase (position relative to codon boundary)
  • Fractional position within transcript
  • Terminal dinucleotides and motif sequences

This information, which is otherwise non-trivial to acquire, enables downstream analyses of U12-type intron distribution, positional biases, and evolutionary patterns.

Citation

If you use intronIC in your research, please cite:

Moyer DC, Larue GE, Hershberger CE, Roy SW, Padgett RA. (2020) Comprehensive database and evolutionary dynamics of U12-type introns. Nucleic Acids Research 48(13):7066–7078. doi:10.1093/nar/gkaa464

License

intronIC is released under the GNU General Public License v3.0.

Related Resources

Clone this wiki locally