Skip to content

Kirigami: RNA secondary structure prediction via deep learning

License

Notifications You must be signed in to change notification settings

marc-harary/kirigami

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

335 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kirigami

arXiv

Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction

Kirigami is a state-of-the-art (SOTA) AI model for RNA secondary structure prediction. On a standardized test set from bpRNA, Kirigami exceeds the performance of other programs like SPOT-RNA, MXfold2, and UFold.

Installation

The easiest way to download and interact with Kirigami is via PyTorch Hub. Simply run

import torch
model = torch.hub.load('marc-harary/kirigami', 'kirigami', pretrained=True)

Usage

For a given FASTA sequence, run

model('GGGGCGAGCUGCAGCCCCAGUGAAUCAAGUGCAGC')
# '.((((........))))..................'

to invoke a convenience __call__ method that embeds the FASTA string and returns a prediction in dot-bracket notation (DBN).

(Re)training

All experiments were performed via PyTorch Lightning. Although the weights of the production model are located at weights/main.ckpt, Kirigami can be retrained with varying hyperparameters. Run

python run.py --help

for an exhaustive list of configurations, displayed via Lightning's CLI. The appropriate configuration files are located in configs.

Data

Data used for training, validation, and testing are taken from the bpRNA database in the form of the standard TR0, VL0, and TS0 datasets used by SPOT-RNA, MXfold2, and UFold. Respectively, these contain 10,814, 1,300, and 1,305 non-redundant structures. The .dbn files located in this repo were generated by scraping the data originally uploaded by the authors of SPOT-RNA. The RNAStrAlign, archiveII, bpRNAnew, and bpRNAnew_mutate datasets, scraped from UFold, are likewise in the data directory.

Name

From Wikipedia:

Kirigami (切り紙) is a variation of origami, the Japanese art of folding paper. In kirigami, the paper is cut as well as being folded, resulting in a three-dimensional design that stands away from the page.

The Kirigami pipeline both folds RNA molecules via a fully convolutional neural network (FCN) and uses Nussinov-style dynamic programming to recursively cut them into subsequences for post-processing.

About

Kirigami: RNA secondary structure prediction via deep learning

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages